[XviD-devel] MMX/SSE/SEE2 implementation

Thu, 12 Dec 2002 09:29:22 -0500

| Anyway, if I remember correctly, the most frequently used
| function in XVID
| is the very simple sad16() i.e. sad16_c(), sad16_mmx(),
| sad16_sse() etc.
|
| If sad16 got faster by using intrinsics (including
| reordering) insteaed
| of nasm, this might be a good point in favour of switching (as
| Edouard suggested a while ago). Can you check this?

Back last summer I played with various Xvid nasm sad16_sse2
optimizations and nothing seemed to make much difference for some
reason, so I never released it.

- Tom

| -----Original Message-----
| From: xvid-devel-admin@xvid.org [mailto:xvid-devel-admin@xvid.org]On
| Behalf Of Christoph Lampert
| Sent: Thursday, December 12, 2002 8:11 AM
| To: xvid-devel@xvid.org
| Subject: RE: [XviD-devel] MMX/SSE/SEE2 implementation
|
|
| On Thu, 12 Dec 2002, James Hauxwell wrote:
| >
| > Yes the compiler can reorder the code, and it is a good
| feature.  Most
| > architectures have multiple pipes, so the ideas is that you
| need to keep
| > them busy for most of the time.  Out of order execution is
| common, and
| > the compiler is normally better at taking advantage of this.
|
| In _theory_, yes, of course. But compilers don't know everything,
| e.g. sometimes it's good to do software prefetch at the right place
| instead of another, which a compiler does not understand, because it
| can't see all dependencies.
| If I were a asm hacker, I would refuse to use a system where this
| feature cannot be switched off. But I guess it can?
|
| Anyway, if I remember correctly, the most frequently used
| function in XVID
| is the very simple sad16() i.e. sad16_c(), sad16_mmx(),
| sad16_sse() etc.
|
| If sad16 got faster by using intrinsics (including
| reordering) insteaed
| of nasm, this might be a good point in favour of switching (as
| Edouard suggested a while ago). Can you check this?
|
| > Other points taken, but I'm not talking about inline assembly, but
| > rather intrinsics.  There is a difference.
|
| Sorry, you are right. However, there's not file mmintrin.h or
| similar on
| my Linux system, so people would have to install that instead
| of nasm.
| I guess, the file is compiler dependent, so we can't just
| include all into
| XVID.
|
| gruel
|
|
|
| > -----Original Message-----
| > From: xvid-devel-admin@xvid.org
| [mailto:xvid-devel-admin@xvid.org] On
| > Behalf Of chl@math.uni-bonn.de
| > Sent: 12 December 2002 11:24
| > To: xvid-devel@xvid.org
| > Subject: Re: [XviD-devel] MMX/SSE/SEE2 implementation
| >
| > Hi,
| >
| > we switches from inline assembler to NASM a while ago (for x86 asm),
| > because NASM is available and compatible for almost any x86
| plattform
| > whereas inline assembler isn't. This way, assembler code
| only has to be
| > written once, and doesn't have to be rewritten for every
| supported x86
| > compiler. Before that, the usual behaviour was that hackers using
| > Windows
| > didn't write assembler for gcc and vice versa. :(
| >
| > There are some methods to overcome this problem (by special
| macros), but
| > at the moment, we are rather happy with nasm (if only
| people learned to
| > install a recent version).
| >
| > Btw. are you sure that inline assembler is really optimized
| (reordered)
| > by
| > the compiler? If I were a assembler programmer, I would _hate_ this
| > "feature"...
| >
| > gruel
| >
| >
| > On Thu, 12 Dec 2002, James Hauxwell wrote:
| > > Hi,
| > >
| > > I have been doing some experiments lately with the Intel/Microsoft
| > > compiler with intrinsics and have been very surprised with the
| > > results.
| > >
| > > To touch base here, what are people opinions to recoding
| the mmx and
| > > such routines using compiler intrinsics?
| > >
| > > My investigations have discovered the following plus points.
| > >
| > > 1, easier to read/code and debug.
| > > 2, you don't have to worry about register allocation and
| scheduling as
| > > the
| > >    compiler does it for you.
| > > 3, you can rebuild for different CPU targets, P4 or P3 or
| Athlon and
| > the
| > >    compiler will best decide how to schedule the
| instructions to avoid
| > > stall.
| > > 4, easier to test new optimizations.
| > > 5, don't need NASM to build, only your compiler.
| > >
| > > Negative points are
| > >
| > > 1, the work required to do it.
| > > 2, GCC and PC compliers to not share the same intrinsic names.
| > > 3, probably others as you will write back and inform me :-)
| > >
| > > As an example, here is a quick version of add_c which took about
| > > 10minutes to write.
| > >
| > > #include <mmintrin.h>
| > >
| > > void add_c(unsigned char *restrict predictor,
| > > 		short *restrict error,
| > > 		int predictor_stride)
| > > {
| > > 	int i, j;
| > >
| > > #pragma unroll(2)
| > > 	for (i = 0; i < 8; i++)
| > > 	{
| > > 		__m64 x0_high;
| > > 		__m64 x0_low = ((__m64 *)error)[i];
| > > 		__m64 zero = _mm_setzero_si64();
| > > 		x0_high = x0_low;
| > >
| > > 		/* extract out 8 to 16 bit */
| > > 		x0_low = _mm_unpacklo_pi16(x0_low, zero);
| > > 		x0_high = _mm_unpackhi_pi16(x0_high, zero);
| > >
| > > 		/* add the error */
| > > 		x0_low = _mm_adds_pu16(x0_low, ((__m64 *)predictor)[0]);
| > > 		x0_high = _mm_adds_pu16(x0_high, ((__m64
| > > *)predictor)[1]);
| > > 		predictor += predictor_stride;
| > >
| > > 		/* saturate and pack */
| > > 		((__m64 *)error)[0] = _mm_packs_pu16(x0_low, x0_high);
| > > 	}
| > > }
| > >
| > > You can see that it's very easy to change the unroll
| amount, whether
| > you
| > > use prefetch or not, or remove the restricted pointers
| and go back to
| > > normal aliasing mode.
| > >
| > > What do people think?
| > >
| > > Jim
| > >
| > > _______________________________________________
| > > XviD-devel mailing list
| > > XviD-devel@xvid.org
| > > http://list.xvid.org/mailman/listinfo/xvid-devel
| > >
| >
| > _______________________________________________
| > XviD-devel mailing list
| > XviD-devel@xvid.org
| > http://list.xvid.org/mailman/listinfo/xvid-devel
| >
| > _______________________________________________
| > XviD-devel mailing list
| > XviD-devel@xvid.org
| > http://list.xvid.org/mailman/listinfo/xvid-devel
| >
|
| _______________________________________________
| XviD-devel mailing list
| XviD-devel@xvid.org
| http://list.xvid.org/mailman/listinfo/xvid-devel
|