[XviD-devel] vfw 2pass stats
Michael Militzer
xvid-devel@xvid.org
Mon, 14 Oct 2002 01:26:04 +0200
Hi,
> > Did someone try to implement this ? Otherwise i'll like to try it
> > myself.
>
> yes, Zhihai He's work at http://vision.ece.ucsb.edu/alumni/zhihai/ . i
started work on it but got sidetracked, i'll see if i can finish it up. it
will only be a 2-pass implementation to start with, as it needs to be
initialized by sample input, and works on a per-frame level instead of
per-block as xvidcore currently does. we should just have to send Qz and
Qnz (to be calculated in mbcoding in 1st pass) back to vfw for 2nd pass
quantizer calculation.
as I mentioned earlier already, I'll do a study project about rate control
this semester. As part of the study I will also do an implementation of the
p-domain source modelling as proposed by Zhihai (so let's see if the method
is really as good as he claims...). The study will also require some
research about human visual perception (and exploiting it to improve
perceived quality in moving pictures...). So I hope the results will help to
further improve XVID (well, a good 1pass CBR mode will be a just side-effect
of the study, but I hope to also improve 2pass and our "psychovisual"
features (ok, it's only lumi-masking currently)).
Some notes from my side regarding the use of p-domain source modelling in
2pass rate control: I don't think that using the u/k/mblocks to estimate the
DCT coeff zeros (as Edouard proposed) will give any good results. I even
more think that any try to calculate the quant value outside core will fail.
So Dan's idea of calulating the quant using first-pass information within
vfw (similar to what we're currently doing) will not work well. Let me
explain why: p-domain source modelling is a unified rate estimation method,
this means that it works for both images and videos - well, essentially it
only works for image and for videos (our case) it operates on motion
compensated data (that's an image again...). Since the motion compensated
images for the same frame in first and second pass are quite different
(because the reference frame is coded/quantized with different parameters in
the second pass), information collected during first pass might not be
applicable anymore for the second pass (at least we would introduce a
considerably big approximation error). Also, the needed DCT coeff
information cannot be generated in mbcoding since we only have quantized DCT
coeffs here, but need unquantized DCT data for Zhihai's formulas iirc. I
could be wrong 'cause it's been some time since I read Zhihai's paper but
afaik the idea is that the percentage of zeros (p) within the _unquantized_
DCT coeffs are considered. (Whereas p correlates to q (quant), therefore the
rate curve R is not only a function of q but also of p...)
My suggestion would be to put the whole 2pass stuff into core. I think pete
worked already on integrating Ed's 2pass lib into core (at least I chatted
with him about this on irc). 2pass could then work by reading in stats data
(maybe during the encoder create phase since this data is "global"), then by
lowering all the frame sizes to achieve the desired overall bitrate/target
filesize and finally by calculating the real quant for the desired frame
size using p-domain source modelling (after motion compensation, of
course... ;-)).
bye,
Michael