[XviD-devel] [PATCH] calc_cbp_sse2 optimization

Radek Czyz syskin at ihug.com.au
Mon Apr 19 08:00:23 CEST 2004


Mat Hostetter wrote:
> This change (against 1.0.0-rc4) speeds up calc_cbp_sse2 from 131
> cycles to 112 cycles on the Pentium 4 (for the in-cache case).

The most unbelivable thing appears to have happened: it's b0rked...

I can't test it myself (no sse2 around) but CruNcher gave me this avi:
http://syskin.is.dreaming.org/sse2patch.avi

As far as I can see, some blocks remain not-coded (cbp&block==0) even if 
encoder thinks they are coded.

Maybe merging went wrong? It is quite possible :))

Aaanyway I'm writing this email right now because I'm still trying to 
learn assembler as much as possible. In the "meantime", I don't 
understand this function. I can see it's trying to mask-out the DC 
coefficient by ANDing it with:

   dw 0, -1, -1, -1, -1, -1, -1, -1

But that's two word-sized zeroes, not one. Are you *sure* it is not 
ignoring the first AC coefficient as well?
I don't have sse2 reference at hand, but I suspect it's not 
sign-extending the coefficients to 32-bits because it's a pure waste.

Just curious, isn't it better just to bitshift the register 16 bits left 
and achive the same result?

:)
Radek


More information about the XviD-devel mailing list