[XviD-devel] colorspaces

Tue, 15 Oct 2002 00:26:36 +0200

Hi,

> how important is it that xvid doesnt overflow (read/write outside)
> the source and destination image buffers?

I'd say it's very important.

> for example: yuv_to_rgb32_mmx writes out 32-bytes of rgb at a time,
> which equates to 8 pixels. if your video is 10 pixels only wide, then
> its possible that 24-bytes are written past the end of the image buffer.
> in theory, a yuv_to_rgb32_sse2 function would be cabable of writing out
> 64-bytes at a time.
>
> ive been thinking about this alot, and have come up with an solution.
> basically it involves modifying the colorspace conversion functions,
> such that they only operate on rows (the conversion functions
> currently operate on the whole image). and then, write an
> image_convert() fucntion which scrolls through each row, calling the
> conversion function.
>
> e.g.
> image_convert(...)
> {
>    int opt_bytes = /* portion of each row that can be safely converted
>                       using the mmx function */
>    int c_bytes = /* remainder of the row to be converted using the
>                       plain-c function */
>
>    if (height < 0)
>    {
>        /* perform image ptr/stride flip  */
>    }
>
>    for (y = 0; y < height; y+= 2) /* convert two rows at a time */
>    {
>       opt_colorspaceFunc(y,u,v, dst, opt_bytes);
>       c_colorspaceFunc(y,u,v, dst, c_bytes);
>       y,u,v += stride
>       dst += stride
>    }
> }
>
> image_convert() then determine how maby bytes of each row can be
> safely converted using the optimised (mmx,sse,etc) function.
> the portion that can't be safely converted, is converted using
> the plain-c function.

what to do about odd heights then? Another extra function would be needed,
right?

> for progressive frames, we operate on two-rows at a time, because
> the u,v is reduced by 2. for interlaced frames, we need to operate
> on four-rows at a time.

well, I would have suggested to seperate the image into three recatngular
planes: one big plane where width and height are multiples of 16. For this
plane, optimized color conversion functions are called (regardless whether
its mmx, xmm or sse, all have to work with a multiple of 16). For the
remaining two planes, two functions are called which can convert the
remainder and at the same time also fill the remaining part of our border
macro blocks with a sensible value (for image_input only). btw: currently,
the remainder of a maybe only half filled border MB is not extended or
filled with a value at all? right? What does the standard say about this?

Anyway, I suppose your idea while maybe being a tad slower might work better
for interlaced frames (?)...

> * c/mmx/etc. colorspace conversion functions only have to operate on a
>   single row. this makes the functions much simpler, thus easier to

a single row is bad. colorspace functions should at least work on 2 rows at
a time. since u,v are subsampled processing two y and one u,v row in one
step is the natural way

bye,
Michael