[XviD-devel] Decoding stream

Tue Jun 24 15:13:57 CEST 2008

Hi Marco,

normally, you don't have to care about any of these internals or any of
these start codes. In almost all real-world applications the video data is
not stored in a raw stream (as the xvid_decraw example expects just for
simplicity). As soon as you have also an audio stream both audio and video
streams are almost certainly muxed together into a container format like
AVI, MP4, OGM or MKV etc.

To play such streams you first need to demux the audio and video stream.
This is done by a demuxer. Usually, the demuxer will provide you the video
data frame-by-frame, so you just need to pass what you get from the demuxer
to the xvid decoder and that's it.

Now if you really have just video-only raw streams the question is from
where you get them. If your own application encoded the video that you're
trying to decode you should consider using a container format like AVI
nonetheless to be able to easily distinguish between single frames
(store one frame per AVI chunk). Or you invent your own ultra-simplistic
container format yourself in case you don't need to share videos encoded
with your application with other apps: you could e.g. simply write the
length of a frame (you get that from the encoder) prior to the actual
frame data into your raw file. Then upon decoding you could easily determine
if your buffer is sufficiently filled to decode the next frame.

Another, probably most lame, solution is to make your buffer large enough
and always keep it sufficiently filled. If you're unsure what 'sufficiently
filled' is you could assume that a compressed frame will not be larger than
uncompressed. So always having at least width*height*1.5 bytes (or the
entire stream if that's less) in your buffer should do too. Of course that's
not really how it should be done and is a waste of memory as well.

With regard to the start codes, their meaning is defined in detail in the
MPEG-4 Visual standard (ISO/IEC 14496-2). Again, you normally don't need to
know. Likely, there should be other, simpler solutions to achieve your goal
(as outlined above).

Regards,
Michael

Quoting Marco Filippini <marco.filippini at cisitaly.com>:

> Michael,
>
> since I'm reading a stream of encoded frames, how do I programmatically get
> a full frame to feed the decoder?
> What are the boundaries of a frame, what's the meaning of the bitstream
> codes (VISOBJSEQ_START_CODE, USERDATA_START_CODE, VOP_START_CODE etc.) and
> how do I have to handle them?
> I'm a newby to XviD, when I started to use the API I was supposed to use the
> information in the xvid.h file without any knowledge of the bitstream, but
> obviously I was wrong.
> Any help in prototyping my code is appreciated.
>
> Best regards
>
> Marco Filippini
>
>
>
>
> -----Original Message-----
> From: xvid-devel-bounces at xvid.org [mailto:xvid-devel-bounces at xvid.org] On
> Behalf Of Michael Militzer
> Sent: martedì 24 giugno 2008 11.09
> To: xvid-devel at xvid.org
> Subject: Re: [XviD-devel] Decoding stream
>
> Stephan,
>
> well, I'm not sure if I made myself clear enough: the xvid decoder expects
> that it's fed with a full frame of data but if not enough data is provided
> or the data is corrupt xvid will take steps to prevent reading outside the
> provided buffer, so it should not crash. However, the decoder will be in an
> invalid state and the decoded picture(s) will be broken.
>
> Now from an API user's point of view, I agree that it would be desirable to
> be bound just by least possible constraints and to not require any knowledge
> on the bitstream itself.
>
> Video decoding however is a highly performance-critical task so you need to
> find a compromise for the API that's most comfortable to use but does not
> sacrifice performance. If we'd allow the user to feed in arbitrary amounts
> of data to the decoder we'd have to cope with this internally and I could
> think of mainly two possible solutions:
>
> 1) The decoder maintains its own buffer and preparses the stream similar
> to what I proposed in earlier mail to ensure that sufficient data is
> available. The involved copying and parsing operations will mean a major
> performance loss however, especially for high-bitrate content. For most
> applications it will also be totally unnecessary because the demuxer can
> do the segmentation to frame boundaries with little or no overhead.
>
> 2) If the decoder detects that not enough data was provided it could rewind
> to a safe state, return and ask the app for more data. This is involved with
> some implementation complexity however because unfortunately also the
> decoder itself cannot tell a priori whether the provided input buffer
> contains sufficient data to decode the next frame or not (without preparsing
> the data). So the decoder will just realize at some point in the middle of
> decoding that it has run out of data. It would then have to rewind to a
> safe state and ask for more data.
>
> This should work pretty fine as long as this happens just rarely. If the
> app always provides just few data at once and then also provides just a
> small additional amount when the decoder returns asking for more the decoder
> might rewind again and again performing many tasks twice and thus harming
> performance.
>
> So I think that the requirement of having to provide a full frame of data is
> a good compromise with regard to decoder performance. Also, it's usually
> not a big deal on the app side because the demuxer can do the segmentation
> to frame-boundaries with no overhead.
>
> Of course, the decoder should be robust to errors and ideally the decoder
> should also be able to recover from errors without harming the quality of
> the output pictures if any possible. Option 2) might be a good solution here
> if insufficient data is provided just rarely to the decoder, however the
> implementation effort may exceed the occasional benefit...
>
> Regards,
> Michael
>
>
> Quoting Stephan Assmus <superstippi at gmx.de>:
>
>>
>> Michael Militzer wrote:
>>> it is correct that the xvid decoder expects to be fed with full frames
>>> all the time. If you provide less data the decoder may read outside your
>>> buffer and crash or detect the problem, incorrectly decode the frame and
>>> return (the intended behavior).
>>>
>>> So to fix your problem you need to ensure that your buffer contains at
>>> least one full frame of data when you call the decoder.
>>
>> I think this situation should be fixed.
>>
>> 1) If an API call gets the buffer size, it may as well respect it no
> matter
>> what.
>>
>> 2) This is speculation, but it sounds like the decoder depends on the data
>> in the chunk being 100% valid, or it may read outside the provided
>> boundaries, even if those indeed contain a full frame. In real life, the
>> data may be corrupt, and even then I would like it to not crash.
>>
>> Hopefully, this doesn't come across as harsh, I just mean to express an
>> opinion from an API user point of view. It's just something that I believe
>> belongs in the library and not in the API user code, otherwise this just
>> means duplicated code with varying degrees of quality, and with 2) I am
>> demonstrating that the API user has no chance of handling corrupt data
>> correctly anyways. The most convenient way would be for xvid to cash
>> incomplete frame data. As an alternative, it should be made clear in the
>> API that only complete frame data may be passed for it to produce output,
>> but even then xvid should respect the provided buffer size no matter what
>> and return an error if it is too small.
>>
>> Best regards,
>> -Stephan
>> _______________________________________________
>> XviD-devel mailing list
>> XviD-devel at xvid.org
>> http://list.xvid.org/mailman/listinfo/xvid-devel
>>
>>
>
>
>
>
>
>
> _______________________________________________
> XviD-devel mailing list
> XviD-devel at xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
>
>
> _______________________________________________
> XviD-devel mailing list
> XviD-devel at xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
>
>