[XviD-devel] [BUG?] cbp_calc_mmx

Edouard Gomez ed.gomez at free.fr
Sun Oct 26 12:17:36 CET 2003


Edouard Gomez (ed.gomez at free.fr) wrote:
> preliminary results for fdct_mmx seem to show a 100 cycles saving with
> a rolled loop, and 50 with an unrolled loop.

Hmmm, i may be silly but i forgot to reset the timestamp counter while
benchmarking, real results are:

ffmpeg mmx fdct unrolled: 339 cycles
ffmpeg mmx fdct rolled: 284 cycles
xvid fdct_mmx (in fdct_mmx.asm): 390 cycles

I also want to know why skal's versions are unused, if i look at xvid.c
we bind:
 1/ fdct_mmx defined in fdct_mmx.asm for MMX processors
 2/ and that's all except for SSE2 which would use fdct_sse2 if the
    code was enabled

fdct_xmm.asm has:
 - xvid_fdct_sse (which  in fact should  be fdct_xmm, as it  doesn't use
   the real sse functions, but just the mmx extended pshufXX mnemonic)
 - xvid_fdct_mmx that replace the psuf with punpck instructions

The  code  is very  similar  to  the ffmpeg  one.  And  if  i trust  the
advertised  cycles written  near  the function  definitions, they're  as
fast. Why don't we use them, they could help a lot in VHQ modes where we
do quite a few fdct/idct?

PS: first i'll finish the ffmpeg ports so we can compare skal's
versions and Fabrice Bellard/Michael Niedermayer versions.

-- 
Edouard Gomez


More information about the XviD-devel mailing list