[XviD-devel] Multithreaded Motion Search - looking for benchmarker

Bryan Mayland bmayland at leoninedev.com
Wed Jul 27 22:42:53 CEST 2005


While I was lying in bed the other night thinking about how much time it 
takes to encode my 720x480 MPEG2 television recordings in 2-pass XViD, I 
considered a processor upgrade.  That of course led me to ponder about 
the performance of XViD on a dual core processor.  I know that the 
xvidcore itself is single threaded for safety and simplicity, but maybe 
it would benefit to run multithreaded in the area where 50% of encoding 
time is spent (on my system): Motion Estimation.

This was brought up in a thread in Sept 04 "[XviD devel] Changes to 
get_pmv2", but it sort of died when gruel said that the overhead of the 
threading negated any performance gains.

Since I don't take no for an answer, I have hacked (and when I say 
hacked I mean this is some really ugly code) support for a win32 
dual-threaded MotionEstimation() against CVS head.  Yeah, win32.  I do 
all of my encoding under linux, but I'm just a better windows programmer 
when it comes to threading.  The new code still works on my system, with 
what only looks like a minor speed hit.  However, I'd like to see if 
performance is increased (theoretically by up to 25%) on a SMP or 
multicore system. 

The algorithm creates a second thread to help with the ME, although 
there is no reason this could not be increased further.  An array 
[mb_width x mb_height] of "completed blocks" is initialized to 0, and an 
array of "available blocks" is initialized to point to the block at 
(0,0). Once the thread has completed the search, the block is flagged as 
completed and then the algorithm checks to see if the block to the right 
can be searched (which is true if block2's top right neighbor is 
complete), and if the block to the bottom left can be searched (which is 
true if block3's left neighbor is complete).  These blocks are added to 
the "available" list.  If there are blocks on the list, the thread takes 
one and starts again. 

This is kinda a rough explanation, and I'll go into more detail if this 
works, but in a nutshell what happens is thread1 searches (0,0) then 
(1,0), sets (0,1) as available and keeps going right.  Thread2 picks up 
on row 1 and starts moving right along that row.  More CPUs/threads 
would just jump on more rows as they are ready so the entire motion 
search actually completes in a diagonal pattern top left to bottom right.

I actually wrangled up a dual Pentium II-450 system and gave this a try
512x480 29.97 xvid source material 34,400 frames, using VirtualDubMod.  
Default options except for b-frames disabled
=========================================
Standard CVS head:  1hr 20mins (7.1fps)
My code: 1hr 6mins (8.7fps)
=========================================
Not bad considering this is just ugly unoptimized code

Is there anyone who has a win32 SMP or multicore machine that would be 
willing to benchmark my xvidcore.dll against the standard CVS head?
http://capnbry.net/~bmayland/fi/xvidcore.zip


More information about the XviD-devel mailing list