[XviD-devel] Re: Quality optimization

Felix von Leitner xvid-devel@xvid.org
Sat, 25 Jan 2003 03:40:47 +0100


Thus spake Radek Czyz (radoslaw@syskin.cjb.net):
> if (x) { // 50% incorrect prediction
>   x = ABS(x) // ~40% incorrect prediction
> ...
> ... and the same for y.

Good to know!

Here is a small piece of code for your consideration:

  static inline int abs(int x) { return x<0?-x:x; }

  static inline int newabs(int x) {
    return (((x>>(sizeof(int)*8-1)) & (x^(-x))) ^ x);
  }

newabs calculates the absolute of a signed integer without branches.

Now, whether this is actually faster depends on the architecture and on
the actually distribution of the numbers.

Here are the timings on my machines (in CPU cycles):

		Athlon XP	Pentium 3	VIA C3
abs(-32769)	24		19		5(20)
newabs(-32769)	3		3		8
abs(32769)	0		1		3
newabs(32769)	3		3		8

The VIA C3 value is misleading, because the VIA documentation says the
jump takes 12 cycles if mispredicted, yet the VIA branch predictor
produces more random results, so the usual outcome is 20, but it was
5 in one of the 10 test runs.

The good thing about this algorithm is that it can be done in parallel
using MMX, so we can theoretically do abs(X) and abs(Y) simultaneously.

> Do you know better tools to do this stuff? CodeAnalyst ignores asm-ed
> code, doesn't allow me to run some functions several times (don't ask
> for details, 300MB needed for a single call of SearchP is a hint here)
> and so on...

Maybe we should talk to the Bochs people.  They already have a CPU
simulator including MMX and SSE, it should be easy for them to output
some data about pipeline stalls and branch prediction data once one
gives them a cycle profile of the different instructions on the given
CPU.

Felix