[XviD-devel] multithreading

Thu Jun 25 16:37:07 CEST 2009

Hi Lars,

I don't think it's evil VfW here to blame at all. It is just in general
rather difficult to make the xvid encoder itself scale over many cores.

Currently, xvid multi-threads motion estimation and mode decision (tasks
that require the most cpu for high-quality encoding). Even this was not
trivial however because there are many interdependencies that require
syncing (macroblock vectors and modes cannot be estimated independently
but need their left, top and top-right neighbours to be already processed).

So when you report a 2.5 times speed-up on a three-core machine I'd say
this is a pretty decent result. Note that according to my measurements you
usually do not gain from using more encoder threads than you have processor
cores.

You are right that one could encode B-frames in parallel to a future I/P-
Frame. That doesn't scale over many cores either however: Usually, you
have just 1 or 2 consecutive B-frames, which means you can keep just 2-3
threads busy then. One could extend that idea to encode several GOPs in
parallel as you suggest but imho that cannot reasonably be handled within
xvid:

In addition to the problems Radek already mentioned I'd also like to point
out the complexity of an implementation: For every frame that you encode in
your "look-ahead" buffers you need to keep the reconstructed picture for
future reference, the motion vectors and modes for prediction, the coded
bitstream and so on. Further, every API - not just VfW - provides you the
input frames in display order. In order to encode future I/P-frames you'll
have to buffer all frames in-between until they can be scheduled for coding
as well. With a max I-frame distance of 300 frames (xvid default) this
could mean hundreds of frames to buffer and would cause an enormous memory
consumption.

So I don't think it is reasonable to implement this in xvid when it can
be implemented a lot easier (and with less overhead) on the application
level. If you have several streams to batch encode, simply start a thread
for every task and run them in parallel. Many applications allow you to
do this (e.g. Handbrake).

If you want to encode one stream on multiple cores, you could split that
video into e.g. four parts, run four xvid encoder instances in parallel
and finally join the results. That's a method that scales easily and I
think there are a number of applications that can do this (Transcode?).

Your idea with the 2-pass stats works too: You could split the input stream
in e.g. four parts, run four xvid encoder instances in parallel that do the
first-pass. You then end up with four first-pass stat files. Your app
then joins the stat files and analyzes the relative complexity of each of
the sub-parts. According to that analysis you assign target bitrates or
filesizes to each of the four parts, launch four xvid encoder instances in
second-pass mode and again join everything together in the end.

I remember that this idea had already been discussed in the past, so there
should be some tool already that can do this for you. In any case, this
method should give good quality (as long as you don't partition the input
into too many small pieces), scales well, will be fast and really easy to
implement (on the application level).

So actually, launching multiple encoder instances at the app-level should
always give the best results - if it's possible. For applications like
capturing from TV or video conferencing the partitioning trick doesn't work
however. In those cases, xvid's multi-threading option will offer you a
decent speed-up nonetheless.

Regards,
Michael

Quoting Lars Täuber <lars.taeuber at web.de>:

> Hi Radek,
>
> Radek Czyz <radoslaw at syskin.cjb.net> schrieb:
>>  > - initialize two buffers. one for future I-frames one for future P-frames
>>
>> You're assuming the two GOPs are independent and they only are in
>> constant-quantizer mode. In any other ratecontrol, overflow/underflow
>> from one gop has to be incorporated in compression strength of following
>> GOP.
>
> yes, you're right I'm using quantized encondings with different   
> quantizers for each frame type.
> I'd like xvid to use more of my cpu cores to encode faster. To me it  
>  seems there is much more power unused on my box.
> But I realize that can't be done with this windows compatibility in mind.
>
> How much work is a second API for non-VfW systems and an encoding   
> scheme like I suggested for quantized encodings?
>
> I wonder if the application (avidemux) could somehow emulate my   
> suggested behaviour by starting parallel encodings of single GOPs by  
>  reading the first pass log file and cut it into pieces of GOPs at   
> occurences of I-frames. In this situation it would be helpful to   
> have a minimalized first and fast run of xvid to only get to know   
> the order of frame types.
> But then the first I-frame of the following GOP can't be used for   
> B-frame calculation at the end of the current GOP. That wouldn't be   
> as efficient as the normal run, would it?
>
> Regards
> Lars
> --
> Lars Täuber <lars.taeuber at web.de>
> _______________________________________________
> Xvid-devel mailing list
> Xvid-devel at xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
>