[XviD-devel] Inlined ASM code again

Edouard Gomez ed.gomez at free.fr
Thu Aug 21 04:10:13 CEST 2003


Christoph Lampert (chl at math.uni-bonn.de) wrote:
> 58 vs 27   for adding 1 to the pointer per iteration.
> 69 vs 38   for adding 8 to the pointer per iteration
> 250 vs 211 for adding 720 to the pointer, same for 64.

Perhaps this can help you judging wether it's profitable or not...
the code is not tested at all (i just had a look at profiles and speed,
not stream nor frame output).

My mirror is up on free.fr for fellows that test arch/tla otherwise you
can find the patch in attachment.

2003-08-21 01:00:32 GMT	Edouard Gomez <ed.gomez at free.fr>	patch-1

    Summary:
      Added first mmx/xmm gcc intrinsic code.
    Revision:
      xvidcore--devapi4-gcc--1.0--patch-1

    Well, the code is not so nice, but the task has to be started one
    way or another. Here's what i've done so far.
     - A define RUNTIME_FUNCTIONS controls wether we use runtime detection
       and function pointers. ATM, there is no way to disable code inlining
       and direct function calls for calc_cbp, sad8/16/16v. I'll patch the
       configure script so it will become an easy to use option.
       /!\ ATM i changed code a way that can damage ports to non IA32 archs
           so don't use that branch on other archs.
     - translated very quickly calc_cbp_mmx, sad8_xmm, sad16_xmm and sad16v_xmm
       to gcc mmx/xmm intrinsics. Functions are postfixed by 'gcc'. No regression
       tests performed for sad functions so far. It's just a test.
     - function pointers which have a gcc intrinsic candidate, have been renamed
       to dsp_${oldname}. This has been done to help automatic detection of
       assembly function usage in XviD (old names are often used in data structures
       as well, and that makes it hard to see what is a function call and what is
       not)
    
    NB: the mmx/xmm code depends very much on the gcc version you use. gcc 3.2.3
        outputs good code, gcc 3.3.x outputs slow code (full of uneeded
        read/writes), experimental gcc 3.4 seems to output something close
        to gcc 3.2.3 (so it's good too).
    
    PS: in order to compile you need to add these switches to gcc in platform.inc
        (either in SPECIFIC_CFLAGS or CFLAGS)
        -mmmx -msse to activate mmx/xmm intrinsics/builtins
        -finline-limit=1200 to force inlining of sad16v which is a rather big
         function

    new files:
     ./src/.arch-ids/ia32-mmx-gcc.h.id ./src/ia32-mmx-gcc.h

    modified files:
     ./examples/xvid_bench.c ./src/bitstream/cbp.c
     ./src/bitstream/cbp.h ./src/encoder.c
     ./src/motion/motion_est.c ./src/motion/sad.c
     ./src/motion/sad.h ./src/portab.h
     ./src/prediction/mbprediction.c ./src/xvid.c


2003-08-19 12:47:13 GMT	Edouard Gomez <ed.gomez at free.fr>	base-0

    Summary:
      tag of ed.gomez at free.fr--2003-1/xvidcore--devapi4--1.0--patch-29
    Revision:
      xvidcore--devapi4-gcc--1.0--base-0

    (automatically generated log message)

-- 
Edouard Gomez


More information about the XviD-devel mailing list