[XviD-devel] Hadamard Transform

skal xvid-devel@xvid.org
06 Sep 2002 18:19:21 +0200


	Gruel,

On Fri, 2002-09-06 at 14:42, Christoph Lampert wrote:
> On 6 Sep 2002, skal wrote:
> > > But here, since you calculate b+a anyway, and a-b, too, you need several
> > > extra bits anyway if you want to keep precision. Not? 
> > >
> > 
> > 	Take an example: a=1, b=126, on 8bit signed precision. [...]
> 
> Of _course_ there are lots of cases when it fails to calculate 2*b 
> when trying to keep precision. But "mathematically" the risk to overflow
> at 2*b isn't higher than at a+b. 

	Agreed! I wanted to point that in practice, one have 
	to keep a eye at the error variance attached to 
	variables after the various stages of computations.
	For instance, after the 1 pass of the row-column FDCT,
	you know that AC coeffs are generally 1 order of 
	magnitude smaller than DC (hey that's what DCT
	and the likes are chosen for!), unless you've been
	fed with random noise. This has a "non-mathematical"
	impact on whether you are going to choose: a+b = (a-b)+2b
	or: a+b =(b-a)+2a. Especially if you're running 'on the
	fringe' regarding precision.
	Similarly, as for the Hadamard transform, looking at the
	H8 matrix, one can spot the "dangerous" column is the first,
	the one with only '+1' coeff. Other ones have equal numbers
	of +1 and -1, and I'd feel like using: 
	b' = (a-e)+(b-f)+(c-g)+(d-h) instead of 
	b'=(a+b+c+d) - (e+f+g+h). The terms (a-e) is "more likely" to
	remain within "reasonable" bounds if I'm fed with video
	signal hopefully grouped around the DC value (videotaping 	chessboards
is prohibited :)
	This leads to the following: if we opt for a 'dirty' transform
	helping decision only, the above grouping of computation
	might be good enough to stay in 8bits and saturate as hell
	the outliers. Let me explain:
	With exception to the first coeff of the Hadamard transform,
	which is the same as DC, and could be taken care of separately 	if not
already available somewhere, computation of others could be
	done in 8bits + saturation as:
	a' = Sat8b(a-e) + Sat8b(b-f) + etc...
	Given that a 8x8 in-place transpose is very cheap, globally 
	staying in 8bits precision to "roughly" evaluate the coeffs
	would be a non-neglectable speedup... It all depend on the
	overall precision we're targeting during evaluation...
	Any opinion? Did I miss something?


	bye,

		Skal