[XviD-devel] Hadamard Transform
skal
xvid-devel@xvid.org
06 Sep 2002 18:19:21 +0200
Gruel,
On Fri, 2002-09-06 at 14:42, Christoph Lampert wrote:
> On 6 Sep 2002, skal wrote:
> > > But here, since you calculate b+a anyway, and a-b, too, you need several
> > > extra bits anyway if you want to keep precision. Not?
> > >
> >
> > Take an example: a=1, b=126, on 8bit signed precision. [...]
>
> Of _course_ there are lots of cases when it fails to calculate 2*b
> when trying to keep precision. But "mathematically" the risk to overflow
> at 2*b isn't higher than at a+b.
Agreed! I wanted to point that in practice, one have
to keep a eye at the error variance attached to
variables after the various stages of computations.
For instance, after the 1 pass of the row-column FDCT,
you know that AC coeffs are generally 1 order of
magnitude smaller than DC (hey that's what DCT
and the likes are chosen for!), unless you've been
fed with random noise. This has a "non-mathematical"
impact on whether you are going to choose: a+b = (a-b)+2b
or: a+b =(b-a)+2a. Especially if you're running 'on the
fringe' regarding precision.
Similarly, as for the Hadamard transform, looking at the
H8 matrix, one can spot the "dangerous" column is the first,
the one with only '+1' coeff. Other ones have equal numbers
of +1 and -1, and I'd feel like using:
b' = (a-e)+(b-f)+(c-g)+(d-h) instead of
b'=(a+b+c+d) - (e+f+g+h). The terms (a-e) is "more likely" to
remain within "reasonable" bounds if I'm fed with video
signal hopefully grouped around the DC value (videotaping chessboards
is prohibited :)
This leads to the following: if we opt for a 'dirty' transform
helping decision only, the above grouping of computation
might be good enough to stay in 8bits and saturate as hell
the outliers. Let me explain:
With exception to the first coeff of the Hadamard transform,
which is the same as DC, and could be taken care of separately if not
already available somewhere, computation of others could be
done in 8bits + saturation as:
a' = Sat8b(a-e) + Sat8b(b-f) + etc...
Given that a 8x8 in-place transpose is very cheap, globally
staying in 8bits precision to "roughly" evaluate the coeffs
would be a non-neglectable speedup... It all depend on the
overall precision we're targeting during evaluation...
Any opinion? Did I miss something?
bye,
Skal