[XviD-devel] Quality optimization
skal
skal at planet-d.net
Wed Feb 26 15:56:57 CET 2003
Hi
On Tue, 2003-02-25 at 18:50, Christoph Lampert wrote:
> since you are still (or again) amongst us ;-)
bwehe
> I saw in your xvid_bench.c code that it checks e.g. SAD speed on
> 16*16 arrays (so stride is 16) and only in perfect alignment. Isn't that
> untypical, because in "real life" stride should be something like 720,
> in any way much larger than cacheline,
well, actually, xvid_bench.c is meant to:
a) perform some unit tests and non-regression (CRC)
b) provide a *lower* bound about how fast a func could
run, given perfect condition.
In real life, most xvid_bench.c's funcs will run slower,
for sure, and a test-suite of sequence will prove it :)
> and also only "Reference" pointer
> would be aligned, not "Current"? Or doesn't this matter on x86?
Well, you of course get a (few ticks) penalty in case of
misalignment during memory access. It starts to matter
with SSE2, where special instructions are provided for
known aligned (or un-aligned) read/write. Sure, a good
assumption is that 'current' is aligned (to 16 for SSE2.
no chroma here!) whereas 'ref' isn't.
Here's for instance a 16x8 simple ref->cur SSE2 copy:
movdqu xmm0, [eax]
movdqu xmm1, [eax+edx]
movdqu xmm2, [eax+2*edx]
movdqu xmm3, [eax+ebx]
lea eax, [eax+4*edx]
movdqu xmm4, [eax]
movdqu xmm5, [eax+edx]
movdqu xmm6, [eax+2*edx]
movdqu xmm7, [eax+ebx]
movdqa [ecx], xmm0
movdqa [ecx+edx], xmm1
movdqa [ecx+2*edx],xmm2
movdqa [ecx+ebx], xmm3
lea ecx, [ecx+4*edx]
movdqa [ecx], xmm4
movdqa [ecx+edx], xmm5
movdqa [ecx+2*edx],xmm6
movdqa [ecx+ebx], xmm7
>
> Also, even though your code is so fast, I didn't find any
> "prefetch" instructions in ASM or C whereas ffmpeg's SAD routines are full
> of them. Didn't you test them, or didn't they yield a speedup?
I've played a little with, without any definitive conclusion
to provide. I thought Pentium4's hardware prefetch would be the
cure, but it appears that this platform is sometimes slower
(with my code, not xvid) than a PIII. Since I'm most of the time
limited by memory bandwith, now, it might be time to really
use the prefetch. As a rule of the thumb, I'd say, taking
motion-compensation as instance, that 'ref' should be
prefetched, and 'cur' should be non-temporal-moved (except
for b-frames). From my small experience, prefetch can eventually
be very powerful (cf. all the various memcpy() flavors), but
it's also very very easy to use it very very badly, especially
for data whose lifespan is not clear enough. So...
bye,
Skal
More information about the XviD-devel
mailing list