[XviD-devel] Hadamard Transform
Christoph Lampert
xvid-devel@xvid.org
Fri, 6 Sep 2002 11:04:15 +0200 (CEST)
On Fri, 6 Sep 2002, monsti wrote:
> >> a' = (a + b + c + d + e + f + g + h) /8
> >> b' = (a + b + c + d - e - f - g - h) /8
> >> c' = (a + b - c - d - e - f + g + h) /8
> >> d' = (a + b - c - d + e + f - g - h) /8
> >> e' = (a - b - c + d + e - f + g - h) /8
> >> f' = (a - b - c + d - e + f + g - h) /8
> >> g' = (a - b + c - d - e + f - g + h) /8
> >> h' = (a - b + c - d + e - f + g - h) /8
>
> Hi all.... I have some time and I tried to MMX ( SSE ) this
> expression.. My result is in attachment.. I don't tested them... Maybe
> is slower even C Version... I'll be happy if this junk be usefull....
Hi,
now the MMXers around can tell me something...
monsti MMXed this calculation, so you really end up transforming 8 bytes
into 8 bytes. The mmx registers hold a to d and e to h and calculation is
done one this. For an 8x8 block you have to call the routine 8 times.
However, I had though it would be possible to do the calculations without
multiplication and by calculating several 8 (or 4) rows of this in
parallel.
So one MMX registers would hold a_1,a_2,a_3,...a_8 (or a_4) etc. another
one b_1,b_2,b_3...a_8,(or b_4) etc. Or let's just say, we take the same
formula as we have, but a,b,c,d,e,f,g,h are _vectors_ of 4 or 8
components.
Would this be possible, too? Or is there one MMX register missing for
that?
gruel