[XviD-devel] [PATCH] calc_cbp_sse2 optimization

Mat Hostetter mat at curl.com
Mon Apr 19 19:20:12 CEST 2004


>>>>> "syskin" == Radek Czyz <syskin at ihug.com.au> writes:

 syskin> What I meant is, that once you load the first register of
 syskin> coefficients, you can bitshift the whole register by 16 bits
 syskin> left, and this will "cut" the DC coefficient. Normally you
 syskin> would have to shift it back (logical, not arithmetic shift)
 syskin> but since you're only checking if values are not zero, they
 syskin> can remain shifted.

OK, now I see what you're saying.  That's a good idea.  There is no
general SSE 128-bit shift instruction, but psrldq does the trick since
we are shifting by an integral number of bytes.  Alternatively pshuflw
could be used to overwrite the DC coefficient with one of the AC
coefficients, since as you say we're just checking for zero.  I like it!

I've appended a patch on top of my previous patch that implements your
idea (using pshuflw).  My unit test and xvid_bench pass with this and
it's faster.

Gotta love all those diff lines starting with "-" :-)

-Mat



--- src/bitstream/x86_asm/cbp_sse2.asm~	2004-04-19 13:09:43.000000000 -0400
+++ src/bitstream/x86_asm/cbp_sse2.asm	2004-04-19 13:02:55.000000000 -0400
@@ -41,7 +41,7 @@
 
 %macro LOOP_SSE2 1
   movdqa xmm0, [edx+(%1)*128]
-  pand xmm0, xmm7
+  pshuflw xmm0, xmm0, 11100101b       ;  overwrite DC coeff with an AC coeff
   movdqa xmm1, [edx+(%1)*128+16]
 
   por xmm0, [edx+(%1)*128+32]
@@ -64,20 +64,6 @@
 %endmacro        
         
 ;=============================================================================
-; Data (Read Only)
-;=============================================================================
-
-%ifdef FORMAT_COFF
-SECTION .rodata data
-%else
-SECTION .rodata data align=16
-%endif
-
-ALIGN 16
-ignore_dc:
-  dw 0, -1, -1, -1, -1, -1, -1, -1
-
-;=============================================================================
 ; Code
 ;=============================================================================
 
@@ -91,7 +77,6 @@
 cglobal calc_cbp_sse2
 calc_cbp_sse2:
   mov edx, [esp+4]         ; coeff[]
-  movdqa xmm7, [ignore_dc] ; mask to ignore dc value
   pxor xmm6, xmm6          ; zero
 
   LOOP_SSE2 0


More information about the XviD-devel mailing list