[XviD-devel] Qpel Problem Samples

Christoph Lampert chl at math.uni-bonn.de
Fri Sep 5 18:17:43 CEST 2003


On Fri, 5 Sep 2003 iibot at gmx.at wrote:
> That reminds me of a question I wanted to ask for a long time:
> Do you use software prefetch for motion estimation? I know that P4 (and AXP)
> HW prefetch is inappropriate in this case.

We tested a lot with prefetch, but it didn't show much effect.

One problem -as I see it- is that prefetch has to be used very carefully
to not slow things down. This isn't memcpy() or a simply dot product,
where you can grab several KB ("a whole fistful of cache") at once, like
the examples for prefetch use. You more or less once just need to fetch
16x16=256 bytes, and from 16 different positions in memory at the same
time with a fixed stride. After that, it's either only 16 more bytes or
even nothing, because cache lines are more than 16 bytes long, so going
"right" or "left" in search should be cache already. 
But you don't know if any other positions than the current will be search
at all. So if you prefetch without need, you spoil precious cache. 
Also, there is the problem of where to put the instructions? It should be
timed such that mem is available exactly when it's needed, but due to
different CPU and memory speeds, that can differ very much. We don't want
ASM code in the core code of XVID to keep plattform independent, and I
don't know a compiler who knows about prefetch. :-( 

But maybe one of the ASM experts can comment on this...


> Or did you save that for final optimizations on 1.0?

Of course... everything that isn't there will be soon. 
Do you want to help? ;-))

> Maybe it doesn't work at all, due to limited amount of buffers (outstanding
> memory load)?

Possible. I think it would be an interesting topic of research how much 
a state-of-the-art MPEG-4 encoder can benefit from well done
software prefetch. There are some results about prefetching, but I think
only for MPEG-2, and not under realistic conditions. 
Also the methods depend very much on the encoder's structure, e.g. if
halfpel interpolation is done block- or image-based. 

gruel




More information about the XviD-devel mailing list