[XviD-devel] idct+transfer operation in decoder

Edouard Gomez ed.gomez at free.fr
Mon Aug 23 00:48:01 CEST 2004


Hey,

I started hacking a merged (idct+transfer) function (in C for
now because it's easy).

More or less like for merged (interpolation+averaging), the C
merged functions don't appear to be faster than the separate
ones... so is anyone on this ML able to tell me if we're going
to speed up decoder if this trick is ported to SIMD functions.

If so answer this email, and i'll add this to mmx functions
for now, and mention the change to christoph nägeli for ppc
users.

Preliminary patch available here:
http://ed.gomez.free.fr/vrac/xvid-patches/idct_add.diff

You'll notice i didn't decide yet what is faster for clipping
the add result... lookup tables may be faster, but i wonder if
accessing this lut doesn't involves as many memory bandwidth
as storing idct result and reading it again during add
operation... so i used a conditional statement instead in
order to be closer to the replaced code (transfer_16to8_add_c)
and thus really saving bandwidth. But the C version isn't 
memory bandwidth limited, so i get roughly the same benchmark
times.

Please comment.

NB: this is probably the last effort i'll make on decoder to
    make it faster... really i don't see more stuff to be
    optimized (memory bandwidth saving wise or computing
    saving wise)

-- 
Edouard Gomez


More information about the XviD-devel mailing list