[XviD-devel] Re: Speed optimization: sad_mmx.asm
suxen_drol
suxen_drol at hotmail.com
Tue Feb 22 00:24:52 CET 2005
hi,
patch committed.
this passes xvid_bench and an encoding bitstream comparison tests.
there is no noticeable speed improvement on my k6-2 (with XVID_CPU_MMX forced).
dark: cvs automagically inserts the date.
cheers,
-- pete
Forwarded by suxen_drol <suxen_drol at hotmail.com>
----------------------- Original Message -----------------------
From: Dark Sylinc <dark_sylinc at yahoo.com.ar>
To: suxen_drol at hotmail.com
Date: Mon, 14 Feb 2005 20:26:20 -0300 (ART)
Subject: Speed optimization: sad_mmx.asm
----
[...]
Most macros were not compliant with register
dependencies and instruction pairing
For example, I repeatedly changed something like this:
psubusb mm1, mm4
por mm0, mm1
psubusb mm3, mm5
por mm2, mm3
movq mm1,mm0
movq mm3,mm2
punpcklbw mm0,mm7
punpckhbw mm1,mm7
punpcklbw mm2,mm7
punpckhbw mm3,mm7
paddusw mm0,mm1
paddusw mm6,mm0
paddusw mm2,mm3
paddusw mm6,mm2
Where there are register dependencies problems
(therefore, no pairing) with the first four
instructions.
There are also pairing issues with the four
"punpcklbw" and "punpckhbw". MMX has only one shifter
unit.
This means that _two_ followed unpack, pack and/or
shift instructions can not be paired. The same happens
with the multiplier unit
This way code is pretty optimized:
psubusb mm1, mm4
psubusb mm3, mm5 ;Here we solved two register
dependencies,
por mm0, mm1 ;two 'por' instructions can be
paired :)
por mm2, mm3
movq mm1,mm0 ;Intermix movq and punpcklbw,
punpcklbw mm0,mm7 ;now they are paired
movq mm3,mm2
punpckhbw mm1,mm7 ;Well... two unpack, :(
punpcklbw mm2,mm7 ;(on some cases I moved a 'lea'
that was being executed before to continue the pair)
paddusw mm0,mm1
punpckhbw mm3,mm7
paddusw mm6,mm0
paddusw mm2,mm3
paddusw mm6,mm2
Ok, they are uncomprehensible for any human being, but
fast.
I tested them with a modified version of xvid_encraw
(forcing MMX optimizations only) with -gmc option and
everything seems
alright (of course, I decompress then...). Although I
recommend to make more tests.
-- pete
More information about the XviD-devel
mailing list