[XviD-devel] Profilign XVID, Part II

Christoph Lampert chl at math.uni-bonn.de
Sun Mar 2 14:21:38 CET 2003


On Sat, 1 Mar 2003, Michael Militzer wrote:
> again: forget it. Memory transfers are unfortunately even more dominant for 
> decoding than for encoding. And you have just profiled decoding with yv12 
> output. Just try the same for rgb output: rgb conversion needs more time than 
> the whole decoding process...

I never use RGB output, and due to graphics cards overlay, I thing nobody
else should, either :) 

Still, I just wanted to post a quick result. I'm no ASM guru, it took me
quite a while to debug the few instructions, but finally I ported AMD's
example for fast-memcpy on Athlon with prefetch of complete 8K blocks
of memory instead of just a few bytes in every iteration. 
Maybe we can't use it in XVID, since we have to skip padding areas but
after all it was just a test for prefetching: 

Athlon XP 1.4GHz (hardware prefetch, 64Byte cacheline, DDR-PC2100) 

glibc memcpy()                                     3.250s   146 MB/s
with MOVQ                                          3.080s   154 MB/s
AMD reference (fistful of cache for Athlon)        0.690s   689 MB/s
arjanv's MOVNTQ (without prefetch, for Athlon)     0.830s   573 MB/s
arjanv's MOVNTQ (with prefetch, for Athlon)        0.830s   573 MB/s
arjanv's interleaved MOVQ/MOVNTQ without prefetchNTA 1.110s 428 MB/s
arjanv's interleaved MOVQ/MOVNTQ with prefetchNTA  0.840s   566MB/s

Btw. according to AMD  1976 MB/s with XP 1800+ is possible. 


PentiumIII 700MHz (32Byte cacheline, SDR-PC100)

glibc memcpy()                                     5.960s    79MB/s
with MOVQ                                          8.280s    57MB/s
AMD reference (fistful of cache)                   1.290s   368MB/s
arjanv's MOVNTQ (without prefetch)                 2.320s   205MB/s
arjanv's MOVNTQ (with prefetch)                    2.250s   211MB/s
arjanv's interleaved MOVQ/MOVNTQ without prefetchNTA 2.970s 160MB/s
arjanv's interleaved MOVQ/MOVNTQ with prefetchNTA  2.290s   207MB/s



gruel




More information about the XviD-devel mailing list