[XviD-devel] Re: qpel smearing/noise problem

Wed, 8 Jan 2003 12:31:57 +0100 (CET)

Hi,

On Wed, 8 Jan 2003, Michael Militzer wrote:
> I see the following ways to solve the problem:
> 1) use xvid for decoding xvid content (encoder+decoder share same idct = no
> problems)

Hm, not good, since we want to create high quality MPEG-4 video, not high
quality XVID video.

> 2) implement the 132 inter macroblocks limit and check if this helps

Easiest way for testing: Make max_keyframe_interval=132 (for 
non-bframe-encoding) and see what happens. Most likely, this won't be
enough. 

> 3) reread the iso specs. I remember a sentence similar to this one: "idct
> has to meet IEEE1180 requirements with the following modifications: ..." -
> so our idct might be IEEE compliant but is maybe not compatible to the
> mpeg-4 standard...

--------------------------------------------------------------------
<quote>
The N by N inverse discrete transform shall conform to IEEE Standard
Specification for the Implementations of 8 by 8 Inverse Discrete Cosine
Transform, Std 1180-1990, December 6, 1990, with the following
modifications : 

1) In item (1) of subclause 3.2 of the IEEE specification,
the last sentence is replaced by: 
"Data sets of 1 000 000 (one million) blocks each should be generated for
(L=256, H=255), (L=H=5) and (L=384, H=383)."

2) The text of subclause 3.3 of the IEEE specification
is replaced by : "For any pixel location, the peak error shall not exceed 
2 in magnitude. There is no other accuracy requirement for this test." 

3) Let F be the set of 4096 blocks Bi[y][x] (i=0..4095) defined as follows:

  a) Bi[0][0] = i - 2048
  b) Bi[7][7] = 1 if Bi[0][0] is even, Bi[7][7] = 0 if Bi[0][0] is odd
  c) All other coefficients Bi[y][x] other than Bi[0][0] and Bi[7][7] are
     equal to 0

For each block Bi[y][x] that belongs to set F defined above, an IDCT that
claims to be compliant shall output a block f[y][x] that as a peak error
of 1 or less compared to the reference saturated mathematical 
integer-number IDCT f(x,y). 

In other words, | f[y][x] - f(x,y)| shall be <= 1 for all x and y.

NOTE 1 Clause 2.3 Std 1180-1990  Considerations of Specifying IDCT
Mismatch Errors  requires the specification of periodic intra-picture
coding in order to control the accumulation of mismatch errors. Every
macroblock is required to be refreshed before it is coded 132 times as
predictive macroblocks. Macroblocks in B-pictures (and skipped macroblocks
in P-pictures) are excluded from the counting because they do not lead to
the accumulation of mismatch errors. This requirement is the same as
indicated in 1180-1990 for visual telephony according to ITU-T
Recommendation H.261.

NOTE 2 Whilst the IEEE IDCT standard mentioned above is a necessary
condition for the satisfactory implementation of the IDCT function it
should be understood that this is not sufficient. In particular attention
is drawn to the following sentence from subclause 5.4:  Where arithmetic
precision is not specified, such as the calculation of the IDCT, the
precision shall be sufficient so that significant errors do not occur in
the final integer values. 
</quote>
----------------------------------------------------------------

I guess the programmers checked 2). But do we also fulfill 3) ?

> 4) try to agree on one idct implementation between all the various mpeg-4
> decoders (impossible, but at least xvid and ffmpeg could agree on one common
> idct, maybe divx also)

As 1), quality should not depend on the decoder, if it is without bounds
of MPEG-4. 

So 5) Check if a) this happens with other encoders, too (ffmpeg? divx?) 
               b) we really do correct rounding in Qpel 

MPEG-4 surely was tested against these errors, and they came up with
the 132 predicted MBs rule. If this is not sufficient for us, it might
very well have something to do with out implementation. 

gruel