[XviD-devel] PlainC optmization

Wed Mar 5 17:59:05 CET 2003

Hi,

yes, avg4 is worse, I know this. In normal interpolate8x8_hv you can exploit 
that all input is from one plane and somewhat neighboured (+1, +stride, 
+stride+1). For avg4 you can specify arbitrary input (from different image 
planes) and I therefore have to recalculate everything while interpolate8x8_hv 
can make use from earlier intermediate results.

But I didn't claim that avg4 is faster, right? ;-)

Anyway, the main difference between my qel avg mmx functions and the "old" mmx 
halfpel interpolate functions are not that I use two (or four) pointers but 
that I calculate everything in packed bytes (and then have to introduce 
additional instructions to cope with possible overflows) while the normal 
interpolate code operates on words. Even though there is some additional 
complecity introduced, overall instruction count is reduced when using bytes 
instead of words.

So it should be also possible to speed up interpolate8x8_hv by replacing the 
packed word interpolation code with new packed bytes one...

bye,
Michael

Quoting Christoph Lampert <chl at math.uni-bonn.de>:

> On Tue, 4 Mar 2003, Michael Militzer wrote:
> 
> > Hi,
> > 
> > have you checked xmm for comparison? Is it much faster than mmx?
> > 
> > BTW: I believe that my interpolate8x8_avg2 function should be faster than
> > the normal interpolate8x8_[h,v] functions (at least mmx vs. mmx). However
> I
> > think I never exactly profiled the functions (or if I did I forgot about
> > it). I remember that I had planned to replace the old mmx interpolation
> code
> > with the new one from avg2 but then didn't had the time/forgot about it.
> > 
> > So I would be quite interested to know how avg2 mmx performs vs. normal
> > interpolate[h,v] mmx. And since you are currently profiling anyway... ;-)
> 
> Hm, avg4 seems to be worse, btw. I guess 4 pointers is too many registers,  
> compared to simply calculating +1, +stride, +stride+1.
> And using 2 pointers or 4 pointers (avg2, avg4) is also not better in
> PlainC. 
> 
> gruel
> 
> ***************************************************************
> P2 - 450 Mhz
> ***************************************************************
>  ===  test block motion ===
> PLAINC - interp- h-round0 0.508 usec       iCrc=8107
> PLAINC -           round1 0.510 usec       iCrc=8100
> PLAINC -   avg2- h-round0 0.621 usec       iCrc=8107
> PLAINC -           round1 0.615 usec       iCrc=8100
> PLAINC - interp- v-round0 0.513 usec       iCrc=8108
> PLAINC -           round1 0.482 usec       iCrc=8105
> PLAINC -   avg2- v-round0 0.615 usec       iCrc=8108
> PLAINC -           round1 0.615 usec       iCrc=8105
> PLAINC - interp-hv-round0 0.777 usec       iCrc=8112
> PLAINC -           round1 0.784 usec       iCrc=8103
> PLAINC -   avg4_hv_round0 1.240 usec       iCrc=8112
> PLAINC -           round1 1.242 usec       iCrc=8103
>  --- 
> MMX    - interp- h-round0 0.224 usec       iCrc=8107
> MMX    -           round1 0.237 usec       iCrc=8100
> MMX    -   avg2- h-round0 0.206 usec       iCrc=8107
> MMX    -           round1 0.206 usec       iCrc=8100
> MMX    - interp- v-round0 0.225 usec       iCrc=8108
> MMX    -           round1 0.234 usec       iCrc=8105
> MMX    -   avg2- v-round0 0.204 usec       iCrc=8108
> MMX    -           round1 0.206 usec       iCrc=8105
> MMX    - interp-hv-round0 0.336 usec       iCrc=8112
> MMX    -           round1 0.340 usec       iCrc=8103
> MMX    -   avg4_hv_round0 0.478 usec       iCrc=8112
> MMX    -           round1 0.478 usec       iCrc=8103
> 
> *****************************************************************
> P3 - 700MHz
> *****************************************************************
>  ===  test block motion ===
> PLAINC - interp- h-round0 0.328 usec       iCrc=8107
> PLAINC -           round1 0.328 usec       iCrc=8100
> PLAINC -   avg2- h-round0 0.396 usec       iCrc=8107
> PLAINC -           round1 0.397 usec       iCrc=8100
> PLAINC - interp- v-round0 0.329 usec       iCrc=8108
> PLAINC -           round1 0.310 usec       iCrc=8105
> PLAINC -   avg2- v-round0 0.396 usec       iCrc=8108
> PLAINC -           round1 0.396 usec       iCrc=8105
> PLAINC - interp-hv-round0 0.500 usec       iCrc=8112
> PLAINC -           round1 0.505 usec       iCrc=8103
> PLAINC -   avg4_hv_round0 0.798 usec       iCrc=8112
> PLAINC -           round1 0.798 usec       iCrc=8103
>  --- 
> MMX    - interp- h-round0 0.147 usec       iCrc=8107
> MMX    -           round1 0.148 usec       iCrc=8100
> MMX    -   avg2- h-round0 0.127 usec       iCrc=8107
> MMX    -           round1 0.128 usec       iCrc=8100
> MMX    - interp- v-round0 0.142 usec       iCrc=8108
> MMX    -           round1 0.148 usec       iCrc=8105
> MMX    -   avg2- v-round0 0.129 usec       iCrc=8108
> MMX    -           round1 0.127 usec       iCrc=8105
> MMX    - interp-hv-round0 0.216 usec       iCrc=8112
> MMX    -           round1 0.219 usec       iCrc=8103
> MMX    -   avg4_hv_round0 0.295 usec       iCrc=8112
> MMX    -           round1 0.294 usec       iCrc=8103
>  --- 
> MMXEXT - interp- h-round0 0.054 usec       iCrc=8107
> MMXEXT -           round1 0.070 usec       iCrc=8100
> MMXEXT - interp- v-round0 0.050 usec       iCrc=8108
> MMXEXT -           round1 0.070 usec       iCrc=8105
> MMXEXT - interp-hv-round0 0.111 usec       iCrc=8112
> MMXEXT -           round1 0.109 usec       iCrc=8103
>  --- 
> 
> 
> _______________________________________________
> XviD-devel mailing list
> XviD-devel at xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
>