[XviD-devel] [PATCH] calc_cbp_sse2 optimization
Radek Czyz
syskin at ihug.com.au
Mon Apr 19 08:00:23 CEST 2004
Mat Hostetter wrote:
> This change (against 1.0.0-rc4) speeds up calc_cbp_sse2 from 131
> cycles to 112 cycles on the Pentium 4 (for the in-cache case).
The most unbelivable thing appears to have happened: it's b0rked...
I can't test it myself (no sse2 around) but CruNcher gave me this avi:
http://syskin.is.dreaming.org/sse2patch.avi
As far as I can see, some blocks remain not-coded (cbp&block==0) even if
encoder thinks they are coded.
Maybe merging went wrong? It is quite possible :))
Aaanyway I'm writing this email right now because I'm still trying to
learn assembler as much as possible. In the "meantime", I don't
understand this function. I can see it's trying to mask-out the DC
coefficient by ANDing it with:
dw 0, -1, -1, -1, -1, -1, -1, -1
But that's two word-sized zeroes, not one. Are you *sure* it is not
ignoring the first AC coefficient as well?
I don't have sse2 reference at hand, but I suspect it's not
sign-extending the coefficients to 32-bits because it's a pure waste.
Just curious, isn't it better just to bitshift the register 16 bits left
and achive the same result?
:)
Radek
More information about the XviD-devel
mailing list