[XviD-devel] optimizing branches away (was: Re: Re[4]: Quality optimization)

Felix von Leitner xvid-devel@xvid.org
Wed, 29 Jan 2003 02:34:15 +0100


Thus spake James Hauxwell (james.hauxwell@st.com):
> I did some tests with MSVC vs Intel compiler version of abs().

Eliminating branches is normally a good thing, however history based
branch prediction (as used on all modern CPUs) on clamping code will
result in perfect prediction, so the expensive case of misprediction
that we are trying to avoid never happens.

I learned this the hard way yesterday when I found this obfuscated
branchless code for clamping:

  i+=((-i+127)>>31)&(-i+127);
  i-=((i+128)>>31)&(i+128);

It turned out to be slower than the version with branches on my Athlon,
because the clamping case never happened and the branches were all
predicted correctly.

Well, it was a day well spent -- I learned a lot, although my
optimizations turned out to be pessimizations in the end.  In case any
one of you has fun with stuff like this, here is my optimized
float-to-int-with-saturation code:

Original:

static int old(float* flt) {
  int off=32768;
  int val;
  val=vorbis_ftoi(*flt*32768.f);
  if(val>32767)val=32767;
  else if(val<-32768)val=-32768;
  return val+off;
}

Mine:

static int test(float* flt) {
  int off=32768;
  unsigned int tmp=*(unsigned int*)flt;
  int x=(tmp&0x7fffff);
  char e=(tmp>>23)-128;
  x|=(-x)&0x800000;
  /* if (e<0) x>>=-e; else x<<=e; */
  x<<=(-e>>7)&e; x>>=((e>>7)&-e)+(23-16);
  /* if (s) x=-x; */
  x-=(((signed int)tmp>>31)&(x<<1)) + (((x+32768)>>31)&(x+32768));
  x+=((-x+32767)>>31)&(-x+32767);
  return x+off;
}

Felix