[XviD-devel] Inlined ASM code again

Christoph Lampert chl at math.uni-bonn.de
Thu Aug 21 16:03:30 CEST 2003


Hi,

please don't think that this related to your patch, which I couldn't
test because gcc 2.95 lacks the intrinsics include files. 
I just thought the thread may be read by people who are interested in this
stuff, so check out this discussion: 

http://lists.insecure.org/lists/linux-kernel/2003/Feb/0501.html

and maybe this description of how icc optimized (I still haven't found a
source how gcc really does optmization).
http://www.linuxjournal.com/article.php?sid=4885

gruel

P.S. For me the benchmark given in 
http://lists.insecure.org/lists/linux-kernel/2003/Feb/0401.html
shows: 

Pentium III at 650: "just gcc"
Proc std:     35880 kticks
Proc std inline:     35930 kticks
Proc sse:      5420 kticks
Proc sse inline:      5280 kticks

Pentium4 at 2400: "just gcc"
Proc std:      6830 kticks
Proc std inline:      6770 kticks
Proc sse:      1540 kticks
Proc sse inline:      1560 kticks

but on Pentium4 when compiled with -O3 -march=pentium4 it changes to 
Proc std:      4950 kticks
Proc std inline:      4940 kticks
Proc sse:      4200 kticks
Proc sse inline:      4170 kticks

and with -O3 alone, the complete intrinsic loop seems to be gone: 
Proc std:      4240 kticks
Proc std inline:      4270 kticks
Proc sse:       130 kticks
Proc sse inline:       130 kticks

What does this mean? I don't know, but I'm sure it means something. 

On Thu, 21 Aug 2003, Edouard Gomez wrote:
> Edouard Gomez (ed.gomez at free.fr) wrote:
> > My mirror  is up on free.fr  for fellows that  test arch/tla otherwise
> > you can find the patch in attachment.
> 
> As usual  the filter cut my emails,  i wonder why it  always dislikes my
> attachment :-)
> 
> Available here:
> http://ed.gomez.free.fr/vrac/gcc-intrinsics.diff.gz
> 
> @skal: sorry but i don't see any advantage from using nasm over a cpp+cc
>        couple. You have macros in both cases, you have a lot more
>        control over variables declaration with a cc (and that count on unix where
>        namespace pollution is a pain), and the more important one is
>        that you can hopefully use complex types (structures) directly in
>        the code... doing  so in nasm is far from easy  because of the cc
>        structure packing rules.
>        Now i must admit i  used intrinsics to test gcc capabilities, but
>        it  could be done  with simple  macros that  would just  wrap mmx
>        opcodes and nothing  more (mmx.h in ffmpeg or  mplayer or ...) so
>        you would still have the same flexibility than nowadays with nasm. 
> 
> -- 
> Edouard Gomez
> _______________________________________________
> XviD-devel mailing list
> XviD-devel at xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
> 



More information about the XviD-devel mailing list