[XviD-devel] [PATCH] calc_cbp_sse2 optimization
Radek Czyz
syskin at ihug.com.au
Mon Apr 19 16:41:30 CEST 2004
Mat Hostetter wrote:
> I will be happy to verify your merged version (just email me the whole
> cbp_sse2.asm, I don't see it in CVS yet), or I can send you mine.
CruNcher sleeps now, I think :) Anyway, don't worry about that, I mostly
wrote the email for the second part.
> syskin> dw 0, -1, -1, -1, -1, -1, -1, -1
>
> syskin> But that's two word-sized zeroes, not one. Are you *sure* it
> syskin> is not ignoring the first AC coefficient as well?
>
> Actually I didn't change this array, this is how it used to work. I'm
> relatively new to nasm but a word is surely two bytes. So this will
> just mask off the DC value.
Yes I know you didn't change that, I was actually pasting it from
original code. A word is surely two bytes, yes - the thing is, doesn't
"dw" stand for "double word"? :) If it doesn't then I learned something :)
> I did briefly try to generate the constant synthetically (...)
[my lack of clarity again...]
What I meant is, that once you load the first register of coefficients,
you can bitshift the whole register by 16 bits left, and this will "cut"
the DC coefficient. Normally you would have to shift it back (logical,
not arithmetic shift) but since you're only checking if values are not
zero, they can remain shifted.
Alternatively, you can read the first row from [coeffs+2] instead of
[coeffs] - 9th coefficient will be checked two times but again, it
doesn't matter, as you're only checking if they are all zero.
The second solution needs unaligned read so is probably worse.
:)
Radek
More information about the XviD-devel
mailing list