[XviD-devel] [PATCH] calc_cbp_sse2 optimization
Mat Hostetter
mat at curl.com
Mon Apr 19 16:23:23 CEST 2004
>>>>> "syskin" == Radek Czyz <syskin at ihug.com.au> writes:
syskin> Mat Hostetter wrote:
>> This change (against 1.0.0-rc4) speeds up calc_cbp_sse2 from 131
>> cycles to 112 cycles on the Pentium 4 (for the in-cache case).
syskin> The most unbelivable thing appears to have happened: it's
syskin> b0rked...
Weird. I just tried xvid_bench on my version and its calc_cbp test
passes for my change, in addition to my own test.
syskin> Maybe merging went wrong? It is quite possible :))
I will be happy to verify your merged version (just email me the whole
cbp_sse2.asm, I don't see it in CVS yet), or I can send you mine.
syskin> dw 0, -1, -1, -1, -1, -1, -1, -1
syskin> But that's two word-sized zeroes, not one. Are you *sure* it
syskin> is not ignoring the first AC coefficient as well?
Actually I didn't change this array, this is how it used to work. I'm
relatively new to nasm but a word is surely two bytes. So this will
just mask off the DC value.
syskin> Just curious, isn't it better just to bitshift the register
syskin> 16 bits left and achive the same result?
I did briefly try to generate the constant synthetically, but it's
harder than you think to generate that constant in an sse register.
Whatever I came up with turned out to be slower that the load.
I'd be happy to be proven wrong though.
-Mat
More information about the XviD-devel
mailing list