[XviD-devel] sse2

Michael Militzer xvid-devel@xvid.org
Thu, 25 Jul 2002 13:15:53 +0200


Hi,

----- Original Message -----
From: "peter ross" <suxen_drol@hotmail.com>
To: <xvid-devel@xvid.org>
Sent: Thursday, July 25, 2002 9:58 AM
Subject: [XviD-devel] sse2


> i've just ran some xvid sse2 tests. most functions seems to work. there
> doesnt seem to much speed improvement over mmx/xmm.
>
> notes:
> - someone has wrote newer sad16_sse2 and dev16_sse2 function which perform
> unalignment checks. these funcs appear much slower than dan's old functions.
> who wrote this code??

all code was written by Dmitry Rozhdestvensky dmitry@servertd.spb.ru except the quant sse and cpb sse functions that
were written by daniel. Dmitry rewrote the sse2 sad code because he told me that dan's code assumed correct alignment
and that this won't work in reality. I have no p4 box, so I have to trust him...

> btw, the new dev16_sse2 is not functionally equivalent to dev16_c

> - fdct_sse2 is also not functionally equivalent to fdct_mmx (less accurate)
> ...and causes the bitstream to increase in size: an extra 100kb for a 24meg
> avi.  fdct_sse2 is about 90% faster than the mmx version.

well, fdct doesn't have to be too precise (at least much less accurate than the idct). Have you measured the deviation
between fdct_mmx and fdct_sse2? 100kb difference in filesize is pretty much so I'd suppose that the sse2 variant is much
less accurate?

> is anyone using a p4, and can confirm the above? i'd like to enable the sse2
> optimizations for public use, as they're currently #def'd out.

I don't have a p4, but Vladimir, who tested the current sse2 code, sent me a profiling log and it seems that the quant
sse2 functions don't give much improvement, the dequant sse2 functions seem to be even slower than the mmx'ed ones...

bye
Michael