[XviD-devel] Hadamard Transform

Christoph Lampert xvid-devel@xvid.org
Fri, 6 Sep 2002 11:04:15 +0200 (CEST)


On Fri, 6 Sep 2002, monsti wrote:

> >> a'  =  (a + b + c + d + e + f + g + h) /8
> >> b'  =  (a + b + c + d - e - f - g - h) /8
> >> c'  =  (a + b - c - d - e - f + g + h) /8
> >> d'  =  (a + b - c - d + e + f - g - h) /8
> >> e'  =  (a - b - c + d + e - f + g - h) /8
> >> f'  =  (a - b - c + d - e + f + g - h) /8
> >> g'  =  (a - b + c - d - e + f - g + h) /8
> >> h'  =  (a - b + c - d + e - f + g - h) /8
> 
> Hi all.... I have some time and I tried to MMX ( SSE ) this
> expression.. My result is in attachment.. I don't tested them... Maybe
> is slower even C Version... I'll be happy if this junk be usefull....

Hi,

now the MMXers around can tell me something...
monsti MMXed this calculation, so you really end up transforming 8 bytes
into 8 bytes. The mmx registers hold a to d and e to h and calculation is
done one this. For an 8x8 block you have to call the routine 8 times. 

However, I had though it would be possible to do the calculations without
multiplication and by calculating several 8 (or 4) rows of this in
parallel. 

So one MMX registers would hold a_1,a_2,a_3,...a_8 (or a_4) etc. another
one b_1,b_2,b_3...a_8,(or b_4) etc. Or let's just say, we take the same
formula as we have, but a,b,c,d,e,f,g,h are _vectors_ of 4 or 8
components. 

Would this be possible, too? Or is there one MMX register missing for
that?

gruel