[XviD-devel] [PATCH] calc_cbp_sse2 optimization

Mat Hostetter mat at curl.com
Mon Apr 19 16:23:23 CEST 2004


>>>>> "syskin" == Radek Czyz <syskin at ihug.com.au> writes:

 syskin> Mat Hostetter wrote:
 >> This change (against 1.0.0-rc4) speeds up calc_cbp_sse2 from 131
 >> cycles to 112 cycles on the Pentium 4 (for the in-cache case).

 syskin> The most unbelivable thing appears to have happened: it's
 syskin> b0rked...

Weird.  I just tried xvid_bench on my version and its calc_cbp test
passes for my change, in addition to my own test.

 syskin> Maybe merging went wrong? It is quite possible :))

I will be happy to verify your merged version (just email me the whole
cbp_sse2.asm, I don't see it in CVS yet), or I can send you mine.

 syskin>    dw 0, -1, -1, -1, -1, -1, -1, -1

 syskin> But that's two word-sized zeroes, not one. Are you *sure* it
 syskin> is not ignoring the first AC coefficient as well?

Actually I didn't change this array, this is how it used to work.  I'm
relatively new to nasm but a word is surely two bytes.  So this will
just mask off the DC value.

 syskin> Just curious, isn't it better just to bitshift the register
 syskin> 16 bits left and achive the same result?

I did briefly try to generate the constant synthetically, but it's
harder than you think to generate that constant in an sse register.
Whatever I came up with turned out to be slower that the load.
I'd be happy to be proven wrong though.

-Mat


More information about the XviD-devel mailing list