[XviD-devel] PlainC optmization

Christoph Lampert chl at math.uni-bonn.de
Tue Mar 4 20:30:36 CET 2003


On Tue, 4 Mar 2003, Michael Militzer wrote:

> Hi,
> 
> have you checked xmm for comparison? Is it much faster than mmx?

Well, I can't check on P2, because that one doesn't have xmm, and 
on P3 MMX speedup is larger than on P2:

PLAINC - interp- h-round0 0.323 usec       iCrc=8107
PLAINC -           round1 0.330 usec       iCrc=8100
PLAINC - interp- v-round0 0.325 usec       iCrc=8108
PLAINC -           round1 0.306 usec       iCrc=8105
PLAINC - interp-hv-round0 0.496 usec       iCrc=8112
PLAINC -           round1 0.499 usec       iCrc=8103
 ---
MMX    - interp- h-round0 0.273 usec       iCrc=8107
MMX    -           round1 0.281 usec       iCrc=8100
MMX    - interp- v-round0 0.298 usec       iCrc=8108
MMX    -           round1 0.293 usec       iCrc=8105
MMX    - interp-hv-round0 0.432 usec       iCrc=8112
MMX    -           round1 0.432 usec       iCrc=8103
 ---
MMXEXT - interp- h-round0 0.211 usec       iCrc=8107
MMXEXT -           round1 0.214 usec       iCrc=8100
MMXEXT - interp- v-round0 0.161 usec       iCrc=8108
MMXEXT -           round1 0.170 usec       iCrc=8105
MMXEXT - interp-hv-round0 0.267 usec       iCrc=8112
MMXEXT -           round1 0.268 usec       iCrc=8103

 
> BTW: I believe that my interpolate8x8_avg2 function should be faster than
> the normal interpolate8x8_[h,v] functions (at least mmx vs. mmx). However I
> think I never exactly profiled the functions (or if I did I forgot about
> it). I remember that I had planned to replace the old mmx interpolation code
> with the new one from avg2 but then didn't had the time/forgot about it.
> 
> So I would be quite interested to know how avg2 mmx performs vs. normal
> interpolate[h,v] mmx. And since you are currently profiling anyway... ;-)
> 

You seems to be right! MMX version seems to be faster.

--- P2 ----
PLAINC - inter-avg2_h_round0 0.609 usec       iCrc=8107
PLAINC -              round1 0.610 usec       iCrc=8100

MMX    - inter-avg2_h_round0 0.387 usec       iCrc=8107
MMX    -              round1 0.387 usec       iCrc=8100


--- P3 ---

PLAINC - inter-avg2_h_round0 0.392 usec       iCrc=8107
PLAINC -              round1 0.392 usec       iCrc=8100

MMX    - inter-avg2_h_round0 0.249 usec       iCrc=8107
MMX    -              round1 0.251 usec       iCrc=8100


gruel



> >
> > Sorry, me again...
> > I just checked the same for P2 450 MHz (MMX, no MMXEXT).
> >
> >  ===  test block motion ===
> > PLAINC - interp- h-round0 0.502 usec       iCrc=8107
> > PLAINC -           round1 0.511 usec       iCrc=8100
> > PLAINC - interp- v-round0 0.504 usec       iCrc=8108
> > PLAINC -           round1 0.475 usec       iCrc=8105
> > PLAINC - interp-hv-round0 0.771 usec       iCrc=8112
> > PLAINC -           round1 0.775 usec       iCrc=8103
> >  ---
> > MMX    - interp- h-round0 0.454 usec       iCrc=8107
> > MMX    -           round1 0.455 usec       iCrc=8100
> > MMX    - interp- v-round0 0.466 usec       iCrc=8108
> > MMX    -           round1 0.466 usec       iCrc=8105
> > MMX    - interp-hv-round0 0.670 usec       iCrc=8112
> > MMX    -           round1 0.671 usec       iCrc=8103
> >
> >
> > Is this possible? Plain MMX really can't do better than _that_?
> > Ouch...
> >
> > gruel
> >
> >
> >
> >
> >
> > On Tue, 4 Mar 2003, Christoph Lampert wrote:
> >
> > > Hi,
> > >
> > > if anyone out there is bored: XVID has lots of places where C-code can
> be
> > > optimized (in particular many routines for which MMX equivalents exist
> > > are not optimized at all):
> > >
> > > I did the simplest tasks for interpolate8x8: Loop unrolling, removal of
> > > dependencies, removal of redundant calculations (of "1-rounding" in this
> > > case)
> > >
> > > Of course it's not important for everyone with MMX, but if it helps on
> > > other plattforms and doesn't make the code too unreadable... why not?
> > >
> > >
> > >
> > > before:
> > >  ===  test block motion ===
> > > PLAINC - interp- h-round0 1.992 usec       iCrc=8107
> > > PLAINC -           round1 1.990 usec       iCrc=8100
> > > PLAINC - interp- v-round0 1.989 usec       iCrc=8108
> > > PLAINC -           round1 1.989 usec       iCrc=8105
> > > PLAINC - interp-hv-round0 3.181 usec       iCrc=8112
> > > PLAINC -           round1 3.180 usec       iCrc=8103
> > >
> > > after:
> > >  ===  test block motion ===
> > > PLAINC - interp- h-round0 0.322 usec       iCrc=8107
> > > PLAINC -           round1 0.329 usec       iCrc=8100
> > > PLAINC - interp- v-round0 0.343 usec       iCrc=8108
> > > PLAINC -           round1 0.306 usec       iCrc=8105
> > > PLAINC - interp-hv-round0 0.496 usec       iCrc=8112
> > > PLAINC -           round1 0.497 usec       iCrc=8103
> > >
> > >
> > > Yeah, I'm such a super-hero ;-)))
> > >
> > > gruel
> > >
> > >
> > > _______________________________________________
> > > XviD-devel mailing list
> > > XviD-devel at xvid.org
> > > http://list.xvid.org/mailman/listinfo/xvid-devel
> > >
> >
> > _______________________________________________
> > XviD-devel mailing list
> > XviD-devel at xvid.org
> > http://list.xvid.org/mailman/listinfo/xvid-devel
> >
> 
> _______________________________________________
> XviD-devel mailing list
> XviD-devel at xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
> 



More information about the XviD-devel mailing list