[XviD-devel] Profilign XVID, Part II
Christoph Lampert
chl at math.uni-bonn.de
Sun Mar 2 14:21:38 CET 2003
On Sat, 1 Mar 2003, Michael Militzer wrote:
> again: forget it. Memory transfers are unfortunately even more dominant for
> decoding than for encoding. And you have just profiled decoding with yv12
> output. Just try the same for rgb output: rgb conversion needs more time than
> the whole decoding process...
I never use RGB output, and due to graphics cards overlay, I thing nobody
else should, either :)
Still, I just wanted to post a quick result. I'm no ASM guru, it took me
quite a while to debug the few instructions, but finally I ported AMD's
example for fast-memcpy on Athlon with prefetch of complete 8K blocks
of memory instead of just a few bytes in every iteration.
Maybe we can't use it in XVID, since we have to skip padding areas but
after all it was just a test for prefetching:
Athlon XP 1.4GHz (hardware prefetch, 64Byte cacheline, DDR-PC2100)
glibc memcpy() 3.250s 146 MB/s
with MOVQ 3.080s 154 MB/s
AMD reference (fistful of cache for Athlon) 0.690s 689 MB/s
arjanv's MOVNTQ (without prefetch, for Athlon) 0.830s 573 MB/s
arjanv's MOVNTQ (with prefetch, for Athlon) 0.830s 573 MB/s
arjanv's interleaved MOVQ/MOVNTQ without prefetchNTA 1.110s 428 MB/s
arjanv's interleaved MOVQ/MOVNTQ with prefetchNTA 0.840s 566MB/s
Btw. according to AMD 1976 MB/s with XP 1800+ is possible.
PentiumIII 700MHz (32Byte cacheline, SDR-PC100)
glibc memcpy() 5.960s 79MB/s
with MOVQ 8.280s 57MB/s
AMD reference (fistful of cache) 1.290s 368MB/s
arjanv's MOVNTQ (without prefetch) 2.320s 205MB/s
arjanv's MOVNTQ (with prefetch) 2.250s 211MB/s
arjanv's interleaved MOVQ/MOVNTQ without prefetchNTA 2.970s 160MB/s
arjanv's interleaved MOVQ/MOVNTQ with prefetchNTA 2.290s 207MB/s
gruel
More information about the XviD-devel
mailing list