Re[2]: [XviD-devel] Hadamard Transform

monsti xvid-devel@xvid.org
Fri, 6 Sep 2002 13:29:11 +0200


------------2DFB13539C7EBAD
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit



CL> now the MMXers around can tell me something...
CL> monsti MMXed this calculation, so you really end up transforming 8 bytes
CL> into 8 bytes. The mmx registers hold a to d and e to h and calculation is
CL> done one this. For an 8x8 block you have to call the routine 8 times. 

CL> However, I had though it would be possible to do the calculations without
CL> multiplication and by calculating several 8 (or 4) rows of this in
CL> parallel. 

CL> So one MMX registers would hold a_1,a_2,a_3,...a_8 (or a_4) etc. another
CL> one b_1,b_2,b_3...a_8,(or b_4) etc. Or let's just say, we take the same
CL> formula as we have, but a,b,c,d,e,f,g,h are _vectors_ of 4 or 8
CL> components. 

CL> Would this be possible, too? Or is there one MMX register missing for
CL> that?

What do you think about another junk from me.. is not well
optimalized... i correct this in future...

This is only example

-- 
Best regards,
 monsti                            mailto:monstrum@tlen.pl

Lukasz Tomczykiewicz
------------2DFB13539C7EBAD
Content-Type: text/plain; name="hadamard_parralel.txt"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="hadamard_parralel.txt"

OyBhLCBiLCBjLCBkLCBlLCBmLCBnLCBoLCAtIHNvdXJjZSBpbiB3b3Jkcw0KOyBhYSwgYmIsIGNj
LCBkZCwgZWUsIGZmLCBnZywgaGggLSBkZXN0aW5hdGlvbiBpbiB3b3JkcyB0b28uLi4NCg0KCW1v
dnEgbW0wLCBhCQk7IG1tMCA9IGEzIGEyIGExIGEwDQoJbW92cSBtbTEsIGIJCTsgbW0xID0gYjMg
YjIgYjEgYjANCgltb3ZxIG1tMiwgbW0wDQoJcGFkZHN3IG1tMCwgbW0xCQk7IG1tMCA9IChhMyAr
IGIzKSAoYTIgKyBiMikgKGExICsgYjEpIChhMCArIGIwKQ0KCXBzdWJzdyBtbTIsIG1tMAkJOyBt
bTAgPSAoYTMgLSBiMykgKGEyIC0gYjIpIChhMSAtIGIxKSAoYTAgLSBiMCkgDQoJbW92cSBtbTQs
IGMJCTsgbW00ID0gYzMgYzIgYzEgYzANCgltb3ZxIG1tMSwgbW0wDQoJbW92cSBtbTMsIG1tMg0K
CXBhZGRzdyBtbTAsIG1tNAkJOyBtbTAgPSA0IHggKCBhICsgYiArIGMgKQ0KCXBzdWJzdyBtbTEs
IG1tNAkJOyBtbTEgPSA0IHggKCBhICsgYiAtIGMgKQ0KCXBzdWJzdyBtbTIsIG1tNAkJOyBtbTIg
PSA0IHggKCBhIC0gYiAtIGMgKQ0KCXBhZGRzdyBtbTMsIG1tNAkJOyBtbTMgPSA0IHggKCBhIC0g
YiAtIGMgKQ0KCW1vdnEgbW00LCBkDQoJcGFkZHN3IG1tMCwgbW00CQk7IG1tMCA9IDQgeCAoIGEg
KyBiICsgYyArIGQpDQoJcHN1YnN3IG1tMSwgbW00CQk7IG1tMSA9IDQgeCAoIGEgKyBiIC0gYyAt
IGQpDQoJcGFkZHN3IG1tMiwgbW00CQk7IG1tMiA9IDQgeCAoIGEgLSBiIC0gYyArIGQpDQoJcHN1
YnN3IG1tMywgbW00CQk7IG1tMyA9IDQgeCAoIGEgLSBiIC0gYyAtIGQpDQoJDQoJbW92cSBtbTQs
IGUNCglwYWRkc3cgbW00LCBmDQoJcGFkZHN3IG1tNCwgZwkJOyBtbTQgPSA0IHggKCBlICsgZiAr
IGcgKQ0KCW1vdnEgbW01LCBoDQoJDQoJbW92cSBtbTYsIG1tMA0KCXBhZGRzdyBtbTYsIG1tNA0K
CXBhZGRzdyBtbTYsIG1tNQ0KCXBzcmF3IG1tNiwgMwkJOyBtbTYgPSA0IHggYScNCgltb3ZxIGFh
LCBtbTYNCg0KCW1vdnEgbW02LCBtbTINCglwc3Vic3cgbW02LCBtbTQJCQ0KCXBzdWJzdyBtbTYs
IG1tNQ0KCXBzcmF3IG1tNiwgMwkJOyBtbTYgPSA0IHggZicNCgltb3ZxIGZmLCBtbTYNCgkNCglt
b3ZxIG1tNiwgbW0zDQoJcHN1YnN3IG1tNiwgbW00CQk7IG5vdCB3ZWxsIG9wdGltYWxpemVkDQoJ
cHN1YnN3IG1tNiwgbW01CQk7IG1heWJlIGluIGZ1dHVyZQ0KCXBzcmF3IG1tNiwgMwkJOyBtbTYg
PSA0IHggZycNCgltb3ZxIGdnLCBtbTYNCg0KCW1vdnEgbW00LCBlDQoJcHN1YnN3IG1tNCwgZg0K
CXBzdWJzdyBtbTQsIGcNCglwc3Vic3cgbW0wLCBtbTQNCglwc3Vic3cgbW0wLCBtbTUNCglwc3Jh
dyBtbTAsIDMJCTsgbW0wID0gNCB4IGInDQoJbW92cSBiYiwgbW0wDQoNCglwYWRkc3cgbW0zLCBt
bTQJCTsgc2FtZSBoZXJlDQoJcGFkZHN3IG1tMywgbW01DQoJcHNyYXcgbW0zLCAzCQk7IG1tMyA9
IDQgeCBoJw0KCW1vdnEgaGgsIG1tMw0KDQoJbW92cSBtbTAsIGUNCgltb3ZxIG1tMywgZg0KCW1v
dnEgbW00LCBxDQoJbW92cSBtbTYsIDMNCglwYWRkc3cgbW02LCBtbTQJCTsgbW02ID0gNCB4ICgg
ZiArIGcgKQ0KCXBzdWJzdyBtbTMsIG1tNAkJOyBtbTMgPSA0IHggKCBmIC0gZyApDQoJcGFkZHN3
IG1tMiwgbW0wDQoJcHN1YnN3IG1tMiwgbW02CQ0KCXBzdWJzdyBtbTIsIG1tNQ0KCXBzcmF3IG1t
MiwgMwkJOyBtbTIgPSA0IHggZScNCgltb3ZxIGVlLCBtbTINCg0KCW1vdnEgbW0yLCBtbTENCglw
c3Vic3cgbW0xLCBtbTANCglwYWRkc3cgbW0yLCBtbTANCglwc3Vic3cgbW0xLCBtbTYNCglwYWRk
c3cgbW0yLCBtbTMNCglwYWRkc3cgbW0xLCBtbTUNCglwc3Vic3cgbW0yLCBtbTUNCglwc3JhdyBt
bTEsIDMJCTsgbW0xID0gNCB4IGMnDQoJcHNyYXcgbW0yLCAzCQk7IG1tMiA9IDQgeCBkJw0KCW1v
dnEgY2MsIG1tMQ0KCW1vdnEgZGQsIG1tMg0KDQoJDQoNCg==

------------2DFB13539C7EBAD--