[XviD-devel] Quality optimization
Christoph Lampert
chl at math.uni-bonn.de
Wed Feb 26 19:24:19 CET 2003
On 25 Feb 2003, skal wrote:
>
> Hi,
>
> almost forgot this one too:
>
> On Wed, 2003-01-22 at 20:01, Marco Al wrote:
> > Christoph Lampert wrote:
> >
> > >> Do we have some timings for a 8 bit Hadamard transform yet?
> > >
> > > I did some a while ago of skal's MMXEXT(?) version and posted them to
> > > the list. I don't remember, but might have been twice the speed of DCT,
> > > but half the speed of SAD.
> >
> > The non attributed asm code only managed 173 cycles with 8 bits accourding to
> > the source, that is not twice as fast as DCT AFAIK.
> >
> here's a C/MMX/SSE version of the Hadamard transform (16bits).
> Without the 'pshufw' re-ordering, output columns are re-ordered
> according to: [03127465]. C-version spits the correct order...
Do you really say this routines calculate Hadamard? Boy, you _are_ good,
they are hyperfast!
DCT on Athlon XP is:
PLAINC - 1.110 usec
MMX - 0.258 usec
MMXEXT - 0.258 usec
SSE2 - 0.272 usec
3DNOW - 1.121 usec
3DNOWE - 1.118 usec
IDCT is
PLAINC - 1.395 usec (<- slower than fDCT?)
MMX - 0.219 usec
MMXEXT - 0.199 usec
SSE2 - 0.219 usec
3DNOW - 1.247 usec
3DNOWE - 0.184 usec
whereas Hadamard is
PLAINC - 0.549 usec
MMXEXT - 0.089 usec
0.089 is about the time of sad16() with MMXEXT needs, too,
so a search routine based on hadamard+sad should not slow things down too
much.
Btw. what we would need in the end is a SATD function (SAD of
transformed), so either, we would have to do
SAD ( Hadamard(Cur) , Hadamard(Ref) ) (*)
with usual sad-routine or
sum(abs( ( Hadamard( Cur - Ref ) )) (**)
In theory these should be identical (Hadamard is linear), but maybe they
are not...
Anyway, would it be faster to combine these steps into a larger routine,
or rather not? Again, I would believe it would, because for (**) the
result of Hadamard doesn't have to be saved, only summed up, but of course
I'm no expert...
gruel
More information about the XviD-devel
mailing list