[XviD-devel] Multithreaded Motion Search - looking for benchmarker
Bryan Mayland
bmayland at leoninedev.com
Wed Jul 27 22:42:53 CEST 2005
While I was lying in bed the other night thinking about how much time it
takes to encode my 720x480 MPEG2 television recordings in 2-pass XViD, I
considered a processor upgrade. That of course led me to ponder about
the performance of XViD on a dual core processor. I know that the
xvidcore itself is single threaded for safety and simplicity, but maybe
it would benefit to run multithreaded in the area where 50% of encoding
time is spent (on my system): Motion Estimation.
This was brought up in a thread in Sept 04 "[XviD devel] Changes to
get_pmv2", but it sort of died when gruel said that the overhead of the
threading negated any performance gains.
Since I don't take no for an answer, I have hacked (and when I say
hacked I mean this is some really ugly code) support for a win32
dual-threaded MotionEstimation() against CVS head. Yeah, win32. I do
all of my encoding under linux, but I'm just a better windows programmer
when it comes to threading. The new code still works on my system, with
what only looks like a minor speed hit. However, I'd like to see if
performance is increased (theoretically by up to 25%) on a SMP or
multicore system.
The algorithm creates a second thread to help with the ME, although
there is no reason this could not be increased further. An array
[mb_width x mb_height] of "completed blocks" is initialized to 0, and an
array of "available blocks" is initialized to point to the block at
(0,0). Once the thread has completed the search, the block is flagged as
completed and then the algorithm checks to see if the block to the right
can be searched (which is true if block2's top right neighbor is
complete), and if the block to the bottom left can be searched (which is
true if block3's left neighbor is complete). These blocks are added to
the "available" list. If there are blocks on the list, the thread takes
one and starts again.
This is kinda a rough explanation, and I'll go into more detail if this
works, but in a nutshell what happens is thread1 searches (0,0) then
(1,0), sets (0,1) as available and keeps going right. Thread2 picks up
on row 1 and starts moving right along that row. More CPUs/threads
would just jump on more rows as they are ready so the entire motion
search actually completes in a diagonal pattern top left to bottom right.
I actually wrangled up a dual Pentium II-450 system and gave this a try
512x480 29.97 xvid source material 34,400 frames, using VirtualDubMod.
Default options except for b-frames disabled
=========================================
Standard CVS head: 1hr 20mins (7.1fps)
My code: 1hr 6mins (8.7fps)
=========================================
Not bad considering this is just ugly unoptimized code
Is there anyone who has a win32 SMP or multicore machine that would be
willing to benchmark my xvidcore.dll against the standard CVS head?
http://capnbry.net/~bmayland/fi/xvidcore.zip
More information about the XviD-devel
mailing list