[XviD-devel] Quality optimization

Marco Al xvid-devel@xvid.org
Wed, 22 Jan 2003 06:10:12 +0100


Christoph Lampert wrote:

> b) Using SAD of Hadamard-transformed blocks instead of ordinary
> SAD for _subpixel_ resolution refinement (not for ordinary search!)
> can also increase PSNR significantly (though not for anime).

Do we have some timings for a 8 bit Hadamard transform yet?

Since you are shooting for quality how about simply using the DCT and
constructing a special LUT to do fast exact #bits calculations? (Without needing
the actual codewords or work on the bitstream it can be done a lot faster.)

Using simply SAD in the transform domain still fails to take into account the
damage high frequency components do to bitrate, so it would still be far from
optimal. Also why only allow it for the subpel search? Why not allow a full pel
refinement step using it too?

 A low complexity alternative to SAD_hadamard which I wouldnt be
surprised to see outperform it is the summed SAD in horizontally and vertically
Sobel filtered images.

> Proposal #3: Implement "intelligent" quantization, e.g. Trellis.
> (or port it from Ffmpeg ;-)

I think it could be improved, I think you could roll the greedy algorithm from
the following paper into the R-D optimization procedure.

http://www-it.et.tudelft.nl/~inald/pubs/Image%20Quality/Adaptive%20spatial%20noi
se%20shaping%20for%20DCT%20based%20image%20compression%201996.pdf

It is a precursor to the more general method of this paper :

http://www-it.et.tudelft.nl/~inald/pubs/Image%20Quality/bnl96.pdf

Of which only the idea of performing the optimization of coefficients in the
order of their perceptual significance is really relevant to my proposed scheme
though.

This can be easily integrated with luminance masking ... and more importantly
can be used to combat blocking. You can minimize error in given pixels by giving
them a greater perceptual weight. So not only could you use
luminance/texture/motion masking (albeit only in the spatial domain, I think
they will work better there than in the DCT domain anyway though) but you could
also use simply assign pixels near the border greater weights, this effectively
migrates quantization error to the inside the block.

A natural extension of this is quality based coding, since the method gives you
a quality measure for free (perceptually weighted spatial MSE) you could use
this for the dquant decision (which also has a lot of room for optimization BTW,
the number of bits needed to encode dquant really needs to be taken into account
... and the only way to do that is to use an optimization not unlike the trellis
based search for coefficients, if you simply had a minimum quality target for
each MB it would be a lot easier than an R-D optimal search using global
distortion though.)

> Proposal #4: Implement a better criterion for dynamic bframes.
> (first we have to find one, but if necessary, just use brute force)

With fixed groups of b-frames brute forcing is possible, but with variable
number of b-frames in a group it seems utterly intractable.

How about if for a sequence of frames you perform ME for some temporal
subsamplings of the sequence too? (So ME for every 2 frames in the sequence and
for every 4 frames etc, up to the point where you only have 2 frames in a
sequence of max_bframes size.) This will give you a much better idea of the
number of bits needed to encode the sequence with a given number of b-frames
(especially if ME gives something better than SAD as a predictor for the number
of texture bits, this ties in with proposal #1).

Marco