[XviD-devel] [PATCH] calc_cbp_sse2 optimization

Radek Czyz syskin at ihug.com.au
Mon Apr 19 16:41:30 CEST 2004


Mat Hostetter wrote:

> I will be happy to verify your merged version (just email me the whole
> cbp_sse2.asm, I don't see it in CVS yet), or I can send you mine.

CruNcher sleeps now, I think :) Anyway, don't worry about that, I mostly 
wrote the email for the second part.

>  syskin>    dw 0, -1, -1, -1, -1, -1, -1, -1
> 
>  syskin> But that's two word-sized zeroes, not one. Are you *sure* it
>  syskin> is not ignoring the first AC coefficient as well?
> 
> Actually I didn't change this array, this is how it used to work.  I'm
> relatively new to nasm but a word is surely two bytes.  So this will
> just mask off the DC value.

Yes I know you didn't change that, I was actually pasting it from 
original code. A word is surely two bytes, yes - the thing is, doesn't 
"dw" stand for "double word"? :) If it doesn't then I learned something :)

> I did briefly try to generate the constant synthetically (...)

[my lack of clarity again...]
What I meant is, that once you load the first register of coefficients, 
you can bitshift the whole register by 16 bits left, and this will "cut" 
the DC coefficient. Normally you would have to shift it back (logical, 
not arithmetic shift) but since you're only checking if values are not 
zero, they can remain shifted.

Alternatively, you can read the first row from [coeffs+2] instead of 
[coeffs] - 9th coefficient will be checked two times but again, it 
doesn't matter, as you're only checking if they are all zero.
The second solution needs unaligned read so is probably worse.

:)
Radek


More information about the XviD-devel mailing list