[XviD-devel] VHQ

Wed Feb 12 14:58:02 CET 2003

On Wed, 12 Feb 2003, Radek Czyz wrote:
> What I discovered: when we do that PSNR drops. This is because when
> we're minimizing MV bits, we move away from 'best match' in the
> understanding of SAD. We can save some bits, but SAD of the match - 
> which means SAD of the MB in new picture [no dct to correct] - will
> get worse. 
> 
> This is not big deal for quantizer 2, and this doesn't happen often at
> quantizer 2 anyway. But for q6 that makes a _big_ difference.

You mean that even between all possible positions with cbp==0 there can be 
big differences in how "good" the positions actually are? For quant==2
there aren't many positions with cbp==0 (because that's a statement about
quantized DCTed differences), but for quant==6 there are many positions,
and simply optimizing vector bits within those (which means going closer
towards the prediction vector) is bad, because cbp==0 just means the
residue is below some level, but not "how much". 

I guess it would be logical then not to use "bits", but differences 
between _unquantized_ DCT data, and since DCT in theory is an orthogonal
operation, that should be equal to SSE with SAD acting as an approximation
of this. 

So what we would need is a criterion _when_ the bits saved actually
outweight the residual error, and it seems "needed bits" alone is not 
enough for this :-(  Current implementation of search uses some kind of
langrangian with fixed lambda-value (depending on quantizer), but this
doesn't seem to be perfect, either, of course. 

Now what? 
Maybe we can use a method not to optmize total number of bits, but bits
needed for residue coding? So we start from the best position after ME. 
If possible, position with best SSE, not with best SAD, but whatever... 

Then, we ignore vector bits and use gradient decent of needed bits for
texture. That way, if we started with cbp==0 (so bits==0) we wouldn't move
at all, but if we started with bits==N, we would at least lower this
number by 1 in every succesful step of refinement. At the same time, the
vector can get longer at most by 1 per step (if we use a small diamond
pattern), so it shouldn't need more than 1 extra bit per step to encode. 
When we end up in a local minimum of residue-bits, we would at least not 
have _raised_ the total number of bits. 
But maybe this leads to no effect at all, or the motion field gets 
garbled, I have no idea, just thinking aloud...

gruel