[XviD-devel] GMC rc3 - TODO
skal
xvid-devel@xvid.org
15 Jan 2003 18:02:44 +0100
Gruel,
On Wed, 2003-01-15 at 17:24, Christoph Lampert wrote:
> > Moreover, weird sizes (not power of 2) where not tested...
>
> Weird sizes for what? For image size?
(for the sprite size, the one used in ROUND_DIV())
>
> P.S. Skal, of course, you _do_ beat the compiler (at least gcc 2.95), but
> not as high as I expected, only by about 10%-12% here... Aren't you
> tempted to make it at least 50% >;-)
>
well, the evil temptation would be to keep on optimizing
a code that is just in test phase so far. Of course there's
still tricks to inject, but it would be premature, and
would the code hardly readable...
For instance, clamping the vector between in [0..W[ and
[0..H[ is faster using:
int offset = 0;
if ((uint32_t)F<W) offset = F;
else if (F>=W) offset = W;
if ((uint32_t)G<H) offset += G*stride;
else if (G>=H) offset += H*stride;
but rather obfuscated...
Anyway, a better hunt for speed would be splitting
the loop in two:
1rst pass: iterate F and G, clamp, store offsets and
weights into a temp array.
2nd pass: perform the bilinear interp. itself, with
a 1-sample delay line
=> Tighter loops, and 2nd pass very suitable
for MMX-izing
Also: move the calc of avgMV away from the main loop:
it may fattens the code too much...
bye,
Skal
> xvid_gmc.c compiled with
>
> gcc -O2 -march=i686 -mcpu=i686 -funroll-loops -fstrict-aliasing
> -fomit-frame-pointer -fPIC -mpreferred-stack-boundary=4
>
> on lame PII-450.
>
>
> "ugly gruel ISO" version ===== test GMC =====
>
> res=2 ... [0][1][2][0][1][2][0][1][2][0][1][2] ... crc=0x7fb9 time=5.824sec 103.02 fps
> res=4 ... [0][1][2][0][1][2][0][1][2][0][1][2] ... crc=0x8a58 time=5.816sec 103.16 fps
> res=8 ... [0][1][2][0][1][2][0][1][2][0][1][2] ... crc=0x58c4 time=5.752sec 104.31 fps
> res=16 ...[0][1][2][0][1][2][0][1][2][0][1][2] ... crc=0x8a58 time=5.743sec 104.48 fps
>
>
> "Skal" version ===== test GMC =====
>
> res=2 ... [0][1][2][0][1][2][0][1][2][0][1][2] ... crc=0x7fb9 time=5.247sec 114.36 fps
> res=4 ... [0][1][2][0][1][2][0][1][2][0][1][2] ... crc=0x8a58 time=5.272sec 113.81 fps
> res=8 ... [0][1][2][0][1][2][0][1][2][0][1][2] ... crc=0x58c4 time=5.214sec 115.08 fps
> res=16 ...[0][1][2][0][1][2][0][1][2][0][1][2] ... crc=0x8a58 time=5.215sec 115.06 fps
>
>
>
> _______________________________________________
> XviD-devel mailing list
> XviD-devel@xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
>