[XviD-devel] automated xvid system test?

Christoph Lampert chl at math.uni-bonn.de
Tue May 4 17:24:47 CEST 2004


Hi Mat,

On Tue, 4 May 2004, Mat Hostetter wrote:
> I've been thinking about writing an automated xvid "system test" as a
> way of learning the encoding/decoding API.

Sure, why not. However be warned: Learning the API is going to be fast, 
there aren't many slots. The "system test" stuff will surely bbe more work.
 
> I'm envisioning a test that encodes and then decodes a video sequence,
> with each resulting frame compared against the original frame.  For
> each frame, compute an error metric and have the test verify that the
> average per-frame error rate and the max error rate of any frame fall
> within some bounds.  It could also verify that the total encoded bits
> generated have some reasonable relation to the requested bitrate.

There used to be the "xvidstat" program in examples dir that did 
a similar thing: Encode, decode and compare PSNR frame by frame. 
However, this is a little tricky for B-frames (because encoding order is 
different from decoding order there), so it wasn't maintained anymore and 
finally removed. I'm sure it's still somewhere in CVS...

> Such a test could be run with various combinations of encoding flags
> and bitrates to test much of the xvid code.  I suppose a code coverage
> tool could even say what cases were not hit and the test could be improved.

Remi Guyomarch wrote a java program and test script for ffmpeg with 
similar purpose: Run ffmpeg encodings with all combination etc., creating 
HTML tables with result. 

A mirror of results is here:   http://fatooh.org/remitests/ 

So, most likely you don't need a special application for finding best
options, because in the end, keeping every frames PSNR is a huge overflow
of data, and if you just want to check average results, mplayer or
transcode in combination with XviD's PSNR option will work well enough.


> As for the test input video, it could be a real video snippet, but
> that would bloat the download, especially if uncompressed.
> Alternatively I could write a program that generates numerous frames
> of geometric shapes bouncing about the screen, with some tweaks to
> make the test more interesting like slowly mutating sizes and colors,
> rotations, gradient fill patterns, etc.  A few small multi-frame
> animations of some real-world video bouncing around the screen might
> add a touch of less-cartoony realism to the test too.  Generating
> these images in a larger frame and then scaling it down (antialiasing)
> to the actual test frames would result in a more realistic test
> (objects would effectively be subpixel-positioned).

Hm, geometric object won't be good to model real images, but maybe better
than nothing. But there also are many standard test-sequences in QCIF
(176x144) resolution available online, those aren't that big, and since
clips only have to be downloaded once, even if the application changes,
maybe it's worth working based on them. It seems that resolution isn't a
big factor in image quality, so to just get numbers (not real viewing),
testing QCIF is just as fine as testing full DVD resolution, only much
faster.

> I'm not sure what the right error metric would be.  An ideal error
> metric would take into account properties of the human visual system,
> but that's not realistic.  A simple error metric would be the
> sum-of-absolute-differences of the pixel values before and after the
> round-trip compression/decompression.  That has the drawback that a
> vertical line shifted over one pixel is a larger error than having the
> vertical line disappear altogether, which could be bad.  Perhaps
> taking the "min" of absolute differences between a "new" pixel and
> each of the 9 nearest "original" pixels would be a more reasonable
> error metric, since the cases where this is pathologically inaccurate
> are less likely to occur.

In theory, SAD (sum-of-absolute-differences) or the most commonly used 
PSNR/SSD (sum-of-squared-differences) is bad, because as you describe, 
e.g. shifting by 1 line gives very poor result although to the human eye 
it's no problem.
If you however incorporate that you don't want to check an arbitrary 
input/output image, but the result of MPEG-4 coding, then PSNR and the 
others aren't so bad, because stuff like shifting by one line cannot 
happen there. Or rather, if it does, it's a major bug in the code and low 
PSNR is a good choice. 
The discussion what is the right error measure has been going on for 
decades, including some threads on this list and in the forum. So maybe 
you shouldn't start with finding the best, but just keeping everything 
transparent, so whoever has an idea can try his own measure. 

Anyway, if you have further questions or want you code to be integrated 
into XviD (e.g. example dir, which is much too empty), then just keep 
posting. 


gruel



More information about the XviD-devel mailing list