[XviD-devel] discussion continue...
skal
xvid-devel@xvid.org
10 Sep 2002 13:52:10 +0200
Marc,
On Mon, 2002-09-09 at 17:24, Marc FD wrote:
> regarding "Hadamarad ;-)" my stuff was in fact 25% slower than skal's.
> I'm not that bad if i can reach a master in the matter ;))
you can do much better than this, by cooperating:
take the first_pass+transpose code, throw it away,
and replace it by a mix of yours and monsti's (list not
exhaustive). Keep the second pass as is, anyhow.
It's a 5min work, and the result will eventually be
better by, let's say 20% at least, than *any* of the
original codes, no matter what tremendous brain juice
is injected hand-optimizing them *separately*.
Perfect example of code cooperation.
here are the details:
the O(N.lnN) algo ("butterfly") beats your O(N^2) one.
Take it for granted. If not for N=8, it'll be for N=16,
or higher...
But, we're not running any N, but specifically N=8.
Each butterfly pass takes 58cycles, but needs 2 transposes
(O(N)), which, for N=8, takes ~80cycles. Hence the
total: 2*58c + 80c = 196c. But if we get rid
of the transpose+1butterfly and replace it by your
O(N^2) code, we gain: 80c+58c - 112c = 26c. Easy money
in 5minutes (the time it takes to write an angry mail ;).
bye,
Skal