[XviD-devel] [BUG?] cbp_calc_mmx
Edouard Gomez
ed.gomez at free.fr
Sun Oct 26 12:17:36 CET 2003
Edouard Gomez (ed.gomez at free.fr) wrote:
> preliminary results for fdct_mmx seem to show a 100 cycles saving with
> a rolled loop, and 50 with an unrolled loop.
Hmmm, i may be silly but i forgot to reset the timestamp counter while
benchmarking, real results are:
ffmpeg mmx fdct unrolled: 339 cycles
ffmpeg mmx fdct rolled: 284 cycles
xvid fdct_mmx (in fdct_mmx.asm): 390 cycles
I also want to know why skal's versions are unused, if i look at xvid.c
we bind:
1/ fdct_mmx defined in fdct_mmx.asm for MMX processors
2/ and that's all except for SSE2 which would use fdct_sse2 if the
code was enabled
fdct_xmm.asm has:
- xvid_fdct_sse (which in fact should be fdct_xmm, as it doesn't use
the real sse functions, but just the mmx extended pshufXX mnemonic)
- xvid_fdct_mmx that replace the psuf with punpck instructions
The code is very similar to the ffmpeg one. And if i trust the
advertised cycles written near the function definitions, they're as
fast. Why don't we use them, they could help a lot in VHQ modes where we
do quite a few fdct/idct?
PS: first i'll finish the ffmpeg ports so we can compare skal's
versions and Fabrice Bellard/Michael Niedermayer versions.
--
Edouard Gomez
More information about the XviD-devel
mailing list