[XviD-devel] B frame help

Sat Feb 8 10:42:14 CET 2003

Hi,

Thanks for the help!

 > > I have to write a bitstream encoder. I use the XviD decoder for
 > ____^^^^^^^
 > 
 > who is forcing you?
 > and what do you mean by bitstream encoder? ie. are you implementing your
 > own bistream routines (putbits,etc.), or writing a mpeg video encoder
 > from scratch.

The force behind it is work, an MPEG4 encoder chip needs a C model before
it is cut into silicon. At the end the whole code will be thrown away
but the HW implementation will be designed by modeling what the SW
does and it will be tested against the C code.

The encoder gets the quantised DCT coefficients and the motion vectors
as well as all the necessary information about frame and MB type. 
(These are working on silicon already.) From that it has to create 
a properly formed MPEG4 stream. Due to the limits of the HW not
everything has to be implemented, no quarter pixel, vectors are
all [-32,31], no quadrant motion, only forward/backward vectors 
and so on. 

Code efficiency is not a constraint, simplicity is. The code has to 
be written in a way that keeps the differences between HW and SW 
implementations in mind.

 > b-vops are very similar to p-vops; the major differences:
 > * macroblock mode/cbp vlcs

It is OK and XviD decodes them properly.

 > * motion vectors stored differently (thus compensation performed
 > differently)

This might be something I do wrong, actually I am sure that that's
where the major problem is. My motion vectors are always in the
[-32,31] range, fcode is always 1. Only forward or backward vector
is used, none of the other methods available for B frames. Only one
vector per macroblock, no quadrants. 

I use the exact same encoder (and predictor) what I use for the 
P frames and that might be the heart of the problem. I do not want
to break the list ethics, but if it is OK, I can attach a (reasonably
small) .tar.gz file with the 3 test frames and the result frames.
Maybe someone who was working a lot with B frames looks at them
and recognises the problem immediately and can tell me me what 
part of the standard I missed or misinterpreted. The images
are small, 320x240 in PPM format, mostly black so they compress 
very well. Also, when I generate the bitstream I also create a 
bitstream debug file which looks like this:

 ...
 116392 VOP header --------------------- 0000_0000_0000_0000_0000_0001_1011_0110
 116424 B frame ------------------------ 10
 116426 Modulo Time Base --------------- 0
 116427 Marker ------------------------- 1
 116428 Time increment B --------------- 0000_1
 116433 Marker ------------------------- 1
 116434 VOP coded ---------------------- 1
 116435 Ftype DC VLC threshold --------- 000
 116438 Scale -------------------------- 1000_0
 116443 Motion pred. f_code_forward ---- 001
 116446 Motion pred. f_code_backward --- 001
 116449 Modb = mb_type + cbpb ---------- 00
 116451 MB type backward --------------- 001
 116454 CBPB --------------------------- 0001_00
 116460 DBquant ------------------------ 0
 116461 Motion vector ------------------ 1
 116462 Motion vector ------------------ 1
 116463 Escape ------------------------- 0000_011
 116470 Esc-3 -------------------------- 11
 116472 Last --------------------------- 1
 116473 Run ---------------------------- 0000_00
 116479 Marker ------------------------- 1
 116480 Signed level ------------------- 0000_0011_1111
 116492 Marker ------------------------- 1
 116493 Skipping B for skipped P ------- 
 116493 Skipping B for skipped P ------- 
 ...

which I could supply. However, I think the problem is the formation
and interpretation of the vectors. XviD remains in sync with the 
bitstream. Thus, I think, syntactically the bitstream is OK, it
is the actual values that it encodes which are incorrect.

 > * co-located block skipping semantics

Well, with that I have also some problems. 
According to the standard MB-s that were not encoded in the last P
frame should be skipped in all B-s that have that P as backward
reference.

I have to apologise for the long ASCII art, but here is a testcase. 
My test image is a 16x16 square moving by 8,8 in every frame. In
the ASCII a macroblock is 4x4 chars.

The frame sequence is I1 b0 P3 B2 P5 B4 I7 B6 P9 B8 ... where b0 is 
thrown away.

####....             I1
####....
####....
####....
........
........
........
........

........             P3, the square moved 16 pixels			
........
........
........
....####
....####
....####
....####

........             The B between the two. The top right and bottom
........             left of the square are missing because those
..##....             macroblocks were not encoded in the P and hence
..##....             they are not encoded in the B either, even though
....##..             they changed. 
....##..
........
........

In addition, the standard (and the XviD code as well), explicitly refers 
to the last P frame and not the last reference frame.
Therefore, in this frame sequence

I1 b0 P3 B2 P5 B4 I7 B6 P9 B8 

when B4 (displayed between P3 and P5) is encoded, its skip map is 
derived from P5. However, when B6 (that in display order is between 
P5 and I7) is encoded its skip map is still derived from P5, that is, 
its forward reference frame instead of its backward one, because P5 
was the last *P* frame received. I don't know if I misinterpreted the 
standard really badly or it is indeed the case.

Thanks in advance,

Zoltan