[XviD-devel] PlainC optmization
Michael Militzer
michael at xvid.org
Wed Mar 5 17:59:05 CET 2003
Hi,
yes, avg4 is worse, I know this. In normal interpolate8x8_hv you can exploit
that all input is from one plane and somewhat neighboured (+1, +stride,
+stride+1). For avg4 you can specify arbitrary input (from different image
planes) and I therefore have to recalculate everything while interpolate8x8_hv
can make use from earlier intermediate results.
But I didn't claim that avg4 is faster, right? ;-)
Anyway, the main difference between my qel avg mmx functions and the "old" mmx
halfpel interpolate functions are not that I use two (or four) pointers but
that I calculate everything in packed bytes (and then have to introduce
additional instructions to cope with possible overflows) while the normal
interpolate code operates on words. Even though there is some additional
complecity introduced, overall instruction count is reduced when using bytes
instead of words.
So it should be also possible to speed up interpolate8x8_hv by replacing the
packed word interpolation code with new packed bytes one...
bye,
Michael
Quoting Christoph Lampert <chl at math.uni-bonn.de>:
> On Tue, 4 Mar 2003, Michael Militzer wrote:
>
> > Hi,
> >
> > have you checked xmm for comparison? Is it much faster than mmx?
> >
> > BTW: I believe that my interpolate8x8_avg2 function should be faster than
> > the normal interpolate8x8_[h,v] functions (at least mmx vs. mmx). However
> I
> > think I never exactly profiled the functions (or if I did I forgot about
> > it). I remember that I had planned to replace the old mmx interpolation
> code
> > with the new one from avg2 but then didn't had the time/forgot about it.
> >
> > So I would be quite interested to know how avg2 mmx performs vs. normal
> > interpolate[h,v] mmx. And since you are currently profiling anyway... ;-)
>
> Hm, avg4 seems to be worse, btw. I guess 4 pointers is too many registers,
> compared to simply calculating +1, +stride, +stride+1.
> And using 2 pointers or 4 pointers (avg2, avg4) is also not better in
> PlainC.
>
> gruel
>
> ***************************************************************
> P2 - 450 Mhz
> ***************************************************************
> === test block motion ===
> PLAINC - interp- h-round0 0.508 usec iCrc=8107
> PLAINC - round1 0.510 usec iCrc=8100
> PLAINC - avg2- h-round0 0.621 usec iCrc=8107
> PLAINC - round1 0.615 usec iCrc=8100
> PLAINC - interp- v-round0 0.513 usec iCrc=8108
> PLAINC - round1 0.482 usec iCrc=8105
> PLAINC - avg2- v-round0 0.615 usec iCrc=8108
> PLAINC - round1 0.615 usec iCrc=8105
> PLAINC - interp-hv-round0 0.777 usec iCrc=8112
> PLAINC - round1 0.784 usec iCrc=8103
> PLAINC - avg4_hv_round0 1.240 usec iCrc=8112
> PLAINC - round1 1.242 usec iCrc=8103
> ---
> MMX - interp- h-round0 0.224 usec iCrc=8107
> MMX - round1 0.237 usec iCrc=8100
> MMX - avg2- h-round0 0.206 usec iCrc=8107
> MMX - round1 0.206 usec iCrc=8100
> MMX - interp- v-round0 0.225 usec iCrc=8108
> MMX - round1 0.234 usec iCrc=8105
> MMX - avg2- v-round0 0.204 usec iCrc=8108
> MMX - round1 0.206 usec iCrc=8105
> MMX - interp-hv-round0 0.336 usec iCrc=8112
> MMX - round1 0.340 usec iCrc=8103
> MMX - avg4_hv_round0 0.478 usec iCrc=8112
> MMX - round1 0.478 usec iCrc=8103
>
> *****************************************************************
> P3 - 700MHz
> *****************************************************************
> === test block motion ===
> PLAINC - interp- h-round0 0.328 usec iCrc=8107
> PLAINC - round1 0.328 usec iCrc=8100
> PLAINC - avg2- h-round0 0.396 usec iCrc=8107
> PLAINC - round1 0.397 usec iCrc=8100
> PLAINC - interp- v-round0 0.329 usec iCrc=8108
> PLAINC - round1 0.310 usec iCrc=8105
> PLAINC - avg2- v-round0 0.396 usec iCrc=8108
> PLAINC - round1 0.396 usec iCrc=8105
> PLAINC - interp-hv-round0 0.500 usec iCrc=8112
> PLAINC - round1 0.505 usec iCrc=8103
> PLAINC - avg4_hv_round0 0.798 usec iCrc=8112
> PLAINC - round1 0.798 usec iCrc=8103
> ---
> MMX - interp- h-round0 0.147 usec iCrc=8107
> MMX - round1 0.148 usec iCrc=8100
> MMX - avg2- h-round0 0.127 usec iCrc=8107
> MMX - round1 0.128 usec iCrc=8100
> MMX - interp- v-round0 0.142 usec iCrc=8108
> MMX - round1 0.148 usec iCrc=8105
> MMX - avg2- v-round0 0.129 usec iCrc=8108
> MMX - round1 0.127 usec iCrc=8105
> MMX - interp-hv-round0 0.216 usec iCrc=8112
> MMX - round1 0.219 usec iCrc=8103
> MMX - avg4_hv_round0 0.295 usec iCrc=8112
> MMX - round1 0.294 usec iCrc=8103
> ---
> MMXEXT - interp- h-round0 0.054 usec iCrc=8107
> MMXEXT - round1 0.070 usec iCrc=8100
> MMXEXT - interp- v-round0 0.050 usec iCrc=8108
> MMXEXT - round1 0.070 usec iCrc=8105
> MMXEXT - interp-hv-round0 0.111 usec iCrc=8112
> MMXEXT - round1 0.109 usec iCrc=8103
> ---
>
>
> _______________________________________________
> XviD-devel mailing list
> XviD-devel at xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
>
More information about the XviD-devel
mailing list