[XviD-devel] streaming mpeg-4 / decoding b-frames

Christoph Lampert xvid-devel@xvid.org
Mon, 2 Sep 2002 09:58:14 +0200 (CEST)


On Sat, 31 Aug 2002, shatty wrote:
> I am trying to ensure that there will be no problems with
> bidirectionally encoded frames in the new media kit api for 
> OpenBeOS. Internally the media kit has a set of nodes and wires 
> similar to various other media APIs. (directshow, gstreamer)  
> The data is passed in a set of buffers that are handed from 
> node to node.  The media kit is oriented around low latency 
> media manipulation, so each node maintains some latency 
> information.
> 
> Which brings me to the b-frames/streaming mpeg4.  If we do the
> naive thing and simply pass the frames in the order in which 
> they are to be presented, we have a big latency hit if a group 
> of b-frames occurs.  We don't have to take this hit if I am not
> mistaken, because each b-frame has only the prior frame and
> the next key frame as its reference. (right?)
> 
> So we could pass the reference frame ahead of the group of
> b-frames and then it would be available for decoding the 
> b-frames, or we could pass it after the first one, for 
> example.  Either of these seems better than the naive approach.
> 
> My question is: how do people handle this now with streaming
> mpeg4 and also, how is this handled in the xvidcore lib?
> Does the lib expect the frames out of order?  Please feel free
> to respond offlist/onlist/IRC as you like.

In case nobody else answers...

---------------- DECODING -------------------------------

The MPEG standard describes in which order frames have to be 
passed: If GOP is IBBPBBPBBP

0 1 2 3 4 5 6 7 8 9 
I B B P B B P B B P

then the frames will be stored in bitstream (and transmitted, of course)
as 

0 3 1 2 6 4 5 9 7 8
I P B B P B B P B B 

So whenever a encoded frame arrives, all necessary reference frame for it
being decoded have already arrived. 

----------------ENCODING-------------------------------

Encoding is more difficult. XVID (and all other codecs I know of)
use the normal input order as input source. Frames that are supposed to
become B-frames are buffered, until the other reference frame have been
encoded. In the beginning there has to be delay of N frames if the maximum
is N consecutive B-frames (that's important for A-V sync!)

viewing:              0   1   2   3   4   5   6   7   8   9 
input to encoder:     0   1   2   3   4   5   6   7   8   9
action in encoder:    -   -   e0  e3  e1  e2  e6  e4  e5  e9  e7  e8
output from encoder:  -   -   0   3   1   2   6   4   5   9   7   8

so there's a delay of 2 empty frame in the beginning. One could also chose 

action in encoder:    e0  -   -  e3  e1  e2  e6  e4  e5  e9  e7  e8
output from encoder:  0   -   -   3   1   2   6   4   5   9   7   8

then there would be an image right at the beginning, but that is shown
for 3 instead of only 1 timestep (I think DivX5 did that, but DivX5 does
only 1 B-frame). 


Btw. it's not a good idea to let the application reorder the frames before
sending it to the codec, because there is a "design flaw" (or simply a
bug) in MPEG-4 standard that makes it necessary to look at intermediate
B-frames before taking SKIP decision at _following_ P-frame. 
So, you cannot _encode_ frames 1 and 2 before having encoded 3, but they
must be available when encoding 3 in order to check if blocks in frame 3
can be SKIPed or not. 



Christoph