[XviD-devel] single-function API

Fri Jan 30 12:10:40 CET 2004

Hi,

On Thu, 29 Jan 2004, tomas carnecky wrote:
> I've a question about B-frames:
> The 'B' stands for bidirectional, that means that the B-frame depends
> on the previous and following frame, but how does it work with the
> encoder/decoder? You pass a raw frame to the encoder and you get the
> bitstream back. So it meight happen that for 10 frames the bitstream is
> 'empty' and that you then get 8000 kbits at once. 

Do you mean "packed mode"? That is in fact just an ugly workaround.
In theory, there is an initial delay tno consume input, but after that
every time step one frame is put out. 
And you numbers are a bit unlikely. Nobody uses 9 bframes. But even if,
8000kbits are just 1 megabyte of data. That's peanuts of data, every
uncompressed input frame is just as big. 

> Wouldn't it be better
> to 'push' the frames to the encoder and then, let's say every 10 frames,
> get the bitstream? 

This is (of course) how the bitstream itself is created: bit-by-bit pushed
into a buffer, and only in the end, written as a sequence of bytes,
because bit-by-bit access to disk would be impossible and also too slow. 
I don't think there would be difference if that buffer is emptied once per
frame or every 10 frames. 

> I think it's faster to call write([to file]) once for
> a while, but with many data than for every frame with only a few bytes.

Hopefully,  write([to file]) is a cached process, so the access
itself should not be slower than for 10 frames, even when writing to a
floppy disk or something. There only is overhead due to the call
itself.
But, since it's really _only_ once per frame, this overhead is
neglectable. Almost everything that happens only once per frame can be
neglected, compared to something that is called for every macroblock or
even every pixel. 

But there's anther reason: AVI container needs exactly 1 frame per chunk,
otherwise decoding will stutter on many players. 

> I think then it would be easier to have _real_ SMP support (encoding a
> movie with several threads):
> First thread analyzes the frames and passes them together with some
> statistics/info further, to the second thread, to the 'core' of the
> encoder.

It is in fact much much more difficult than that. 
First of all, the core is not prepared for multithreading. I once had
multithreaded ME, but we threw it out because it made the code much
less readable and OS independend threading is still a pain in the a**.

Second, you cannot just split analyzes and encoding and perform it in
parallel. Analysis of current depends on previous frame. For efficiency,
on ME vectors etc, but also, and one cannot get rid of this, on the image
data. That image data is the internal representation of the previously
decoded frame, after DCT/quantization/iDCT, it's not the input material of
the previous frame. 
Of course, you can create workarounds, but I'm sure results will be worse. 

What one could do is encode sequences of B-frames in parallel, because
they don't rely on each other. But we would need to change quite a lot of
internal stuff for that, allocate space for everything dynamically etc. 

I rather believe that OpenMP and letting the compiler decide on threads
could be an goable way to speed up on SMP/SMT machines. But I haven't
tried much, yet. 

chl