[XviD-devel] discussion continue...

skal xvid-devel@xvid.org
10 Sep 2002 13:52:10 +0200


	Marc,

On Mon, 2002-09-09 at 17:24, Marc FD wrote:

> regarding "Hadamarad ;-)" my stuff was in fact 25% slower than skal's.
> I'm not that bad if i can reach a master in the matter ;))

	you can do much better than this, by cooperating:

	take the first_pass+transpose code, throw it away,
	and replace it by a mix of yours and monsti's (list not
	exhaustive). Keep the second pass as is, anyhow.
	It's a 5min work, and the result will eventually be
	better by, let's say 20% at least, than *any* of the
	original codes, no matter what tremendous brain juice
	is injected hand-optimizing them *separately*.
	Perfect example of code cooperation.

	here are the details:

	the O(N.lnN) algo ("butterfly") beats your O(N^2) one.
	Take it for granted. If not for N=8, it'll be for N=16, 
	or higher...
	But, we're not running any N, but specifically N=8.
	Each butterfly pass takes 58cycles, but needs 2 transposes
	(O(N)), which, for N=8, takes ~80cycles. Hence the
	total: 2*58c + 80c = 196c. But if we get rid
	of the transpose+1butterfly and replace it by your
	O(N^2) code, we gain: 80c+58c - 112c = 26c. Easy money
	in 5minutes (the time it takes to write an angry mail ;).

	bye,

		Skal