[XviD-devel] Inlined ASM code again
Edouard Gomez
ed.gomez at free.fr
Thu Aug 21 04:10:13 CEST 2003
Christoph Lampert (chl at math.uni-bonn.de) wrote:
> 58 vs 27 for adding 1 to the pointer per iteration.
> 69 vs 38 for adding 8 to the pointer per iteration
> 250 vs 211 for adding 720 to the pointer, same for 64.
Perhaps this can help you judging wether it's profitable or not...
the code is not tested at all (i just had a look at profiles and speed,
not stream nor frame output).
My mirror is up on free.fr for fellows that test arch/tla otherwise you
can find the patch in attachment.
2003-08-21 01:00:32 GMT Edouard Gomez <ed.gomez at free.fr> patch-1
Summary:
Added first mmx/xmm gcc intrinsic code.
Revision:
xvidcore--devapi4-gcc--1.0--patch-1
Well, the code is not so nice, but the task has to be started one
way or another. Here's what i've done so far.
- A define RUNTIME_FUNCTIONS controls wether we use runtime detection
and function pointers. ATM, there is no way to disable code inlining
and direct function calls for calc_cbp, sad8/16/16v. I'll patch the
configure script so it will become an easy to use option.
/!\ ATM i changed code a way that can damage ports to non IA32 archs
so don't use that branch on other archs.
- translated very quickly calc_cbp_mmx, sad8_xmm, sad16_xmm and sad16v_xmm
to gcc mmx/xmm intrinsics. Functions are postfixed by 'gcc'. No regression
tests performed for sad functions so far. It's just a test.
- function pointers which have a gcc intrinsic candidate, have been renamed
to dsp_${oldname}. This has been done to help automatic detection of
assembly function usage in XviD (old names are often used in data structures
as well, and that makes it hard to see what is a function call and what is
not)
NB: the mmx/xmm code depends very much on the gcc version you use. gcc 3.2.3
outputs good code, gcc 3.3.x outputs slow code (full of uneeded
read/writes), experimental gcc 3.4 seems to output something close
to gcc 3.2.3 (so it's good too).
PS: in order to compile you need to add these switches to gcc in platform.inc
(either in SPECIFIC_CFLAGS or CFLAGS)
-mmmx -msse to activate mmx/xmm intrinsics/builtins
-finline-limit=1200 to force inlining of sad16v which is a rather big
function
new files:
./src/.arch-ids/ia32-mmx-gcc.h.id ./src/ia32-mmx-gcc.h
modified files:
./examples/xvid_bench.c ./src/bitstream/cbp.c
./src/bitstream/cbp.h ./src/encoder.c
./src/motion/motion_est.c ./src/motion/sad.c
./src/motion/sad.h ./src/portab.h
./src/prediction/mbprediction.c ./src/xvid.c
2003-08-19 12:47:13 GMT Edouard Gomez <ed.gomez at free.fr> base-0
Summary:
tag of ed.gomez at free.fr--2003-1/xvidcore--devapi4--1.0--patch-29
Revision:
xvidcore--devapi4-gcc--1.0--base-0
(automatically generated log message)
--
Edouard Gomez
More information about the XviD-devel
mailing list