[XviD-devel] discussion continue...

Marco Al xvid-devel@xvid.org
Tue, 10 Sep 2002 20:39:58 +0200


Marc FD wrote:
> ----- Message d'origine -----
> De : "skal" <skal@planet-d.net>
> À : <xvid-devel@xvid.org>
> Cc : <XviD-devel@xvid.org>
> Envoyé : mardi 10 septembre 2002 13:52
> Objet : Re: [XviD-devel] discussion continue...
>
>
>> Marc,
>>
>> On Mon, 2002-09-09 at 17:24, Marc FD wrote:
>>
>>> regarding "Hadamarad ;-)" my stuff was in fact 25% slower than skal's.
>>> I'm not that bad if i can reach a master in the matter ;))
>>
>> you can do much better than this, by cooperating:
>>
>> take the first_pass+transpose code, throw it away,
>> and replace it by a mix of yours and monsti's (list not
>> exhaustive). Keep the second pass as is, anyhow.
>> It's a 5min work, and the result will eventually be
>> better by, let's say 20% at least, than *any* of the
>> original codes, no matter what tremendous brain juice
>> is injected hand-optimizing them *separately*.
>> Perfect example of code cooperation.
>>
>> here are the details:
>>
>> the O(N.lnN) algo ("butterfly") beats your O(N^2) one.
>> Take it for granted. If not for N=8, it'll be for N=16,
>> or higher...
>> But, we're not running any N, but specifically N=8.
>> Each butterfly pass takes 58cycles, but needs 2 transposes
>> (O(N)), which, for N=8, takes ~80cycles. Hence the
>> total: 2*58c + 80c = 196c. But if we get rid
>> of the transpose+1butterfly and replace it by your
>> O(N^2) code, we gain: 80c+58c - 112c = 26c. Easy money
>> in 5minutes (the time it takes to write an angry mail ;).
>>
>> bye,
>>
>> Skal
>
> sorry, maybe i'm sumb, (i'm even pretty sure of that!)
> but i understand almost nothing of what you said :
> what is O ? InN ? why N^2

It's known as big-O notation, random link picked off google :
http://www.nist.gov/dads/HTML/bigOnotation.html

> i used full butterflyed vertical (2x32) and horizontal(8x8).
> i thinked it was slower because i'm not using
> only vertical + transpose (faster?)

I assume he is suggesting that using butterflies for the columns
using vector processing, and full matrix multiply (which can be
fully parallized with MMX without transposing) for the rows.

Marco