[XviD-devel] Adding SSE2 asm codes for color space transforming funcion

Jason Garrett-Glaser darkshikari at gmail.com
Fri Jun 19 09:03:50 CEST 2009


> movlps [edi + 32],xmm0   ;  movlps + movhps are faster than one movdqu :)

Only on Athlon 64, probably.

On Phenom and Nehalem it will be most definitely slower, and probably
slower on basically everything else too.

Also, since the shuffle unit is slow on the Conroe, that code will
almost certainly be slower than the MMX version on Conroe.

Dark Shikari


More information about the Xvid-devel mailing list