[XviD-devel] mulithread rework

Radek Czyz radoslaw at syskin.cjb.net
Tue Apr 22 15:44:41 CEST 2008


Hello, I'm the author of this code (don't laugh ;P) so I should be able 
to help.

First, this code doesn't actually divide picture into slices. Output is 
identical regardless of number of threads - this was my design goal and 
that's the reason why it works the way it does. I'm not particularly 
attached to this design goal though, it was more of "can it be done" 
experiment.

In order to encode a macroblock, a thread needs a macroblock on 
top/right already encoded.

So, two things happen:

* each thread updates a count of "how many macroblocks are encoded in 
this row" under its "complete_count_self" memory location.

* each thread encodes no more blocks than "complete_count_above", which 
is again a memory location and it's updated by whatever thread works on 
the macroblock row above it.

Thread initialization code ensures that thread for row N and for N+1 
share the same pointer in order to communicate. The single memory 
location is updated by thread for row n and read by thread for row n+1. 
All synchronization happens with this set of memory locations, nothing 
more is needed.

yield() simply means that current thread can't encode a block because 
the thread above it didn't manage to encode on time (or perhaps we're 
getting older value because of caches, in which case there's still no 
harm done).

Hopefully this helps
Radek


Con Kolivas wrote:
> Hi all
> 
> I see not a lot of activity has been going on for a while, but I noticed there 
> was multithread code in the cvs repo. I downloaded it and gave it a shot and 
> found it did speed up maybe 40% but that's not a great improvement for a 4 
> core machine. So I had a look at the code to see if I could help.
> 
> Now within the code I see each P frame is sliced up and encoded by different 
> threads. What I also found was the sched_yield was used as a locking 
> primitive which unfortunately is less than ideal. Turning the sched_yield 
> into a noop makes it spin on wait burning cpu cycles doing nothing and 
> actually slows the encoding down a lot. I'd like to help improve this 
> threading code with some proper locking in the hope I could speed it up. 
> However, I was hoping someone could tell me, exactly what is it that the 
> sched_yield is giving time to; ie by yielding, what is it waiting on 
> happening.
> 
> I could poke around indefinitely and I'll probably eventually find it but at 
> the moment it's too obscure for me to decipher from this:
> 
> 				if (current_mb >= max_mbs) {
> 					/* current workload is zero */
> 					x--;
> 					sched_yield();
> 					continue;
> 				}
> 
> Hopefully someone can enlighten me because I'd love to see if I could help 
> make this multithreaded code scale better.
> 
> Regards.


More information about the XviD-devel mailing list