[XviD-devel] sse2

peter ross xvid-devel@xvid.org
Tue, 30 Jul 2002 15:31:41 +1000


okay, i had a look at the sse2 dequant code on friday. yep, its slower than 
xmm. but by replacing the loop increment and saturation code, makes sse2 
~20% faster than xmm.

i noticed we have problem regarding sse2 alignment:
nasm can only guarantee the .data section to be 4-byte aligned. the 
following symbol is not 16-byte aligned under plain-old msvc.

.text
align 16
sse2_value    times 8 dw 1

the only way i could make the value 16-byte aligned, was to export the 
symbol.

cglobal sse2_value
.text
align 16
sse2_value    times 8 dw 1

i assume the sp5+processor pack will fix this
the mmx, xmm and sse2 dequant code is near identical, and could easiily be 
macro'ized.

cya
-- pete

_________________________________________________________________
Send and receive Hotmail on your mobile device: http://mobile.msn.com