Re[2]: [XviD-devel] motion estimation for B/P/I decision

Marco Al xvid-devel@xvid.org
Sat, 21 Sep 2002 16:56:56 +0200


Marc FD wrote:

> i will try to understand it now. it's seems dumb,
> but i understand C better than english ;)
> when i would have understood how it works, i would
> ASM-optimise it. any hint for that ?

Well, for the calculation of the accumulated sum you might want to accumulate
along the columns first and then sum the results along the scanline for the
total accumulated sum at a pixel (I did it the other way round). Because the
final summation is inherently serial it is best that goes along the scanline
instead of the bit you can parallelize.

Also it might be possible to keep all of the accumulated sum tables in cache, by
having several inner loops between which you switch every X scanlines (so first
you update and use them as normal, then after a while you switch and you start
overwriting the values which you wont be needing anymore anyway, pity x86 doesnt
support circular buffers would make this almost trivial). This means you cant
perform the Q calculation after the accumulation of course, they would have to
be intermingled ... nasty :)

If that cannot work (because caches are too small to contain all the data)
because bandwith is so important you might even want to calculate the
accumulated sum twice, you can make do with only 2 scanlines of storage per
table (instead of 9 for the above method). If you calculate Q you are only using
2 scanlines from the accumulated sum tables, so you would calculate those ...
calculate Qs for the entire scanline ... then update the accumulated sums in
those scanlines. This would mean twice as much work for calculating the
accumulated sums, but it might be worth it.

Of course throw some prefetches in where applicable, not all of us have palos
and P4s as some of XviD's assembly hackers seem to assume ;) (those have
hardware prefetch, my duron doesnt :()

> BTW, can i use this implementation (or a modified one)
> for avisynth? it would be cool for our Compare filter.

I dont mind, give credit (and a link) to Zhou Wang though, he asks for that on
his site.

BTW I have this nagging suspicion the metric can break down on mostly flat
surfaces ... but Ill have to think about it and maybe experiment a bit.

Marco