[XviD-devel] Re: Speed optimization: sad_mmx.asm

suxen_drol suxen_drol at hotmail.com
Tue Feb 22 00:24:52 CET 2005


hi,

patch committed.
this passes xvid_bench and an encoding bitstream comparison tests.
there is no noticeable speed improvement on my k6-2 (with XVID_CPU_MMX forced).

dark: cvs automagically inserts the date.

cheers,
-- pete

Forwarded by suxen_drol <suxen_drol at hotmail.com>
----------------------- Original Message -----------------------
 From:    Dark Sylinc <dark_sylinc at yahoo.com.ar>
 To:      suxen_drol at hotmail.com
 Date:    Mon, 14 Feb 2005 20:26:20 -0300 (ART)
 Subject: Speed optimization: sad_mmx.asm
----

[...]

Most macros were not compliant with register
dependencies and instruction pairing
For example, I repeatedly changed something like this:
  psubusb mm1, mm4
  por mm0, mm1
  psubusb mm3, mm5
  por mm2, mm3

  movq mm1,mm0
  movq mm3,mm2

  punpcklbw mm0,mm7
  punpckhbw mm1,mm7
  punpcklbw mm2,mm7
  punpckhbw mm3,mm7

  paddusw mm0,mm1
  paddusw mm6,mm0
  paddusw mm2,mm3
  paddusw mm6,mm2

Where there are register dependencies problems
(therefore, no pairing) with the first four
instructions.
There are also pairing issues with the four
"punpcklbw" and "punpckhbw". MMX has only one shifter
unit.
This means that _two_ followed unpack, pack and/or
shift instructions can not be paired. The same happens
with the multiplier unit
This way code is pretty optimized:

  psubusb mm1, mm4
  psubusb mm3, mm5 ;Here we solved two register
dependencies,
  por mm0, mm1     ;two 'por' instructions can be
paired :)
  por mm2, mm3

  movq mm1,mm0      ;Intermix movq and punpcklbw,
  punpcklbw mm0,mm7 ;now they are paired
  movq mm3,mm2
  punpckhbw mm1,mm7 ;Well... two unpack, :(
  punpcklbw mm2,mm7 ;(on some cases I moved a 'lea'
that was being executed before to continue the pair)
  paddusw mm0,mm1
  punpckhbw mm3,mm7
  paddusw mm6,mm0
  paddusw mm2,mm3
  paddusw mm6,mm2

Ok, they are uncomprehensible for any human being, but
fast.
I tested them with a modified version of xvid_encraw
(forcing MMX optimizations only) with -gmc option and
everything seems
alright (of course, I decompress then...). Although I
recommend to make more tests.

-- pete


More information about the XviD-devel mailing list