[XviD-devel] automated xvid system test?
Christoph Lampert
chl at math.uni-bonn.de
Tue May 4 17:24:47 CEST 2004
Hi Mat,
On Tue, 4 May 2004, Mat Hostetter wrote:
> I've been thinking about writing an automated xvid "system test" as a
> way of learning the encoding/decoding API.
Sure, why not. However be warned: Learning the API is going to be fast,
there aren't many slots. The "system test" stuff will surely bbe more work.
> I'm envisioning a test that encodes and then decodes a video sequence,
> with each resulting frame compared against the original frame. For
> each frame, compute an error metric and have the test verify that the
> average per-frame error rate and the max error rate of any frame fall
> within some bounds. It could also verify that the total encoded bits
> generated have some reasonable relation to the requested bitrate.
There used to be the "xvidstat" program in examples dir that did
a similar thing: Encode, decode and compare PSNR frame by frame.
However, this is a little tricky for B-frames (because encoding order is
different from decoding order there), so it wasn't maintained anymore and
finally removed. I'm sure it's still somewhere in CVS...
> Such a test could be run with various combinations of encoding flags
> and bitrates to test much of the xvid code. I suppose a code coverage
> tool could even say what cases were not hit and the test could be improved.
Remi Guyomarch wrote a java program and test script for ffmpeg with
similar purpose: Run ffmpeg encodings with all combination etc., creating
HTML tables with result.
A mirror of results is here: http://fatooh.org/remitests/
So, most likely you don't need a special application for finding best
options, because in the end, keeping every frames PSNR is a huge overflow
of data, and if you just want to check average results, mplayer or
transcode in combination with XviD's PSNR option will work well enough.
> As for the test input video, it could be a real video snippet, but
> that would bloat the download, especially if uncompressed.
> Alternatively I could write a program that generates numerous frames
> of geometric shapes bouncing about the screen, with some tweaks to
> make the test more interesting like slowly mutating sizes and colors,
> rotations, gradient fill patterns, etc. A few small multi-frame
> animations of some real-world video bouncing around the screen might
> add a touch of less-cartoony realism to the test too. Generating
> these images in a larger frame and then scaling it down (antialiasing)
> to the actual test frames would result in a more realistic test
> (objects would effectively be subpixel-positioned).
Hm, geometric object won't be good to model real images, but maybe better
than nothing. But there also are many standard test-sequences in QCIF
(176x144) resolution available online, those aren't that big, and since
clips only have to be downloaded once, even if the application changes,
maybe it's worth working based on them. It seems that resolution isn't a
big factor in image quality, so to just get numbers (not real viewing),
testing QCIF is just as fine as testing full DVD resolution, only much
faster.
> I'm not sure what the right error metric would be. An ideal error
> metric would take into account properties of the human visual system,
> but that's not realistic. A simple error metric would be the
> sum-of-absolute-differences of the pixel values before and after the
> round-trip compression/decompression. That has the drawback that a
> vertical line shifted over one pixel is a larger error than having the
> vertical line disappear altogether, which could be bad. Perhaps
> taking the "min" of absolute differences between a "new" pixel and
> each of the 9 nearest "original" pixels would be a more reasonable
> error metric, since the cases where this is pathologically inaccurate
> are less likely to occur.
In theory, SAD (sum-of-absolute-differences) or the most commonly used
PSNR/SSD (sum-of-squared-differences) is bad, because as you describe,
e.g. shifting by 1 line gives very poor result although to the human eye
it's no problem.
If you however incorporate that you don't want to check an arbitrary
input/output image, but the result of MPEG-4 coding, then PSNR and the
others aren't so bad, because stuff like shifting by one line cannot
happen there. Or rather, if it does, it's a major bug in the code and low
PSNR is a good choice.
The discussion what is the right error measure has been going on for
decades, including some threads on this list and in the forum. So maybe
you shouldn't start with finding the best, but just keeping everything
transparent, so whoever has an idea can try his own measure.
Anyway, if you have further questions or want you code to be integrated
into XviD (e.g. example dir, which is much too empty), then just keep
posting.
gruel
More information about the XviD-devel
mailing list