[XviD-devel] Dev-API-4 : Possibility to mark a defined area of the encoded picture as 'BLACK=0' , to encode BlackBars with absolute minimum of bitrate ?

Tue Jul 29 12:10:19 CEST 2003

Hi GomGom,

thanks for your reply. Dont forget to visit us by time on 
irc.corecodec.com , #matroska , we could even found an anime free #xvid 
channel there ;) .... LOL

Edouard Gomez wrote:

>Syskin has  better skills  in ME  than I, but  afaik, as  soon as  a SAD
>comparison is zero, the ME returns and that's all done. This implies not
>so much  CPU time  for the borders  (unless they  are not 16px  sized in
>height, one  MB line will take  a bit more of  CPU, but it  would be the
>case with "bb" hints anyway)
>So i don't think it's worth overbloating user capabilities for gaining a
>few cycles. Let's wait what our expert in ME says, c'mon syskin.
>  
>

sysKin replied to me in great detail on IRC, and unfortunately what he 
was telling me was not at all motivating, as it seems MPEG4 is not 
really suitable to use black bars for the picture. I cant repeat all of 
his explanations, as i naturally only understood a small part of it ;-), 
but basically it seems that MPEG4 Motion Estimation has a hard time 
detecting motion if there is an edge, or even two of them, in the 
picture. sysKin mentioned that MPEG4 in theory offers the possibility to 
'partition' a picture, but this is neither implemented in XviD nor 
planned nor does anybody know anything about that.

>Concerning the resolutions, i've heard  that DVD could use a newer video
>codec  in  near  future.  If  it's  MPEG4, you  should  stick  to  their
>recomandation no ? 
>
Well, from sysKin's explanation i learned that my considerations are 
maybe completely wrong. If its true that MPEG4 cant handle black bars 
very well, it makes more than sense that any hardware decoder HAS to 
offer much better resizing capabilities to be able to handle various 
different resolutions, and also interlaced, than the old MPEG1/2 decoder 
chips would have ( that was the main reason why DVD/SVCD/VCD would only 
support a couple of resolutions and aspect ratios at the beginning ). I 
still feel that its a good idea to leave the normal 1:1 Pixel Aspect 
Ratio scenario behind, my latest test encodings have shown that a 560 x 
352 anamorphic encoding ( Starwars EP2, 136 mins, VHQ 4, 2 b frames, 
dx50 vop, h.263 quant ) was looking much better to my eyes than a 
'normal' 696 x 288 encoding with square pixels, while number of pixels 
is almost the same for both. No idea if the codec will have a 
'preference' for a picture that is more 'square', or if its just the 
visual perception of a higher vertical resolution, sysKin or you guys 
may answer this better i guess.

Different to MP4, the matroska container allows basically any float to 
describe the output AR, but its recommended to describe it as 2 
integers, one being the preferred width and the other the height of the 
output picture. For the example given above, i would set

w/h = 832/352

in mkvmerge, while 352 x 2,35 = 827,2 , with a remaining aspect error of 
about 0.5 %. A software player supporting matroska would resize the 
picture to 832 x 352 on playback, thus the full vertical resolution 
would be preserved while the horizontal res is stretched accordingly. 
DirectShow players with full matroska support, like TCMP with the 
matroska CDL ( it can read the w/h from the track header *BEFORE* 
calling the graph ), will of course use DirectShow and thus the hardware 
overlay for that, so its mainly neutral in terms of CPU power. All other 
players can do it using latest ffdshow-alpha ( activate 'use overlay' in 
the config window ), and in this case it will be ffdshow's internal 
resizing filter that will first stretch the picture to the bigger width 
accordingly, before DirectShow will expand it to fullscreen using 
hardware, so this version is not completely free of using some 
additional CPU power but i guess its almost neglectible. In principal 
this means that the encoding resolution can be chosen completely free, 
as the output resolution will be *ALWAYS* determined by the w/h as set 
in the track header, and its the job of the player/decoder filter to 
make sure this output AR is achieved, no matter what the input 
resolution is.

Now, for a hardware player this is of course more difficult to support, 
and for this very reason we would like to try to standardize certain 
picture resolutions and output resolutions for those users that believe 
that matroska may be suported on standalone units one day in future ( 
actually, i have email contact with 2 Asian companies about this right 
now, but to be honest, i guess they are more polite than really 
interested ;-) ). On the example above, with a 560 x 352 encoded res, on 
a PC this could be outputted as

560 x 420 = 4 : 3   ( vertical stretch, +20% )
656 x 352 = 16 : 9  ( horizontal stretch, +17% )
832 x 352 = 21 : 9  ( horizontal stretch, +48% )

or for fullscreen from the hardware output on normal TVs this would be

PAL , 768 x 576

768 x 576 = 4 : 3    ( horizontal stretch, +37%, vertical stretch + 64% )
768 x 412 = 16 : 9  ( horizontal stretch, +37%, vertical stretch +17%,  
resizer to add black borders on top/bottom )
768 x 324 = 21 : 9  ( horizontal stretch +37%, vertical -8%, resizer to 
add ..... )

NTSC, 640 x 480

640 x 480 = 4 : 3    ( horizontal stretch, +14%, vertical stretch + 36% )
640 x 342 = 16 : 9  ( horizontal stretch, +14%, vertical stretch -3%,  
resizer to add black borders on top/bottom )
640 x 272 = 21 : 9  ( horizontal stretch +14%, vertical -23%, resizer to 
add ..... )

For another example, if the user decides to encode his movie in

640 x 576 ( another res from the old list ), here are the equivalent 
numbers, first for the PC

768 x 576 = 4 : 3  ( vertical stretch, +20% )
1024 x 576 = 16 : 9 ( vertical stretch, +60% )
1344 x 576 = 21 : 9 ( vertical stretch, + 110% )

PAL , 768 x 576

768 x 576 = 4 : 3    ( horizontal stretch, +20%, vertical stretch + 0% )
768 x 412 = 16 : 9  ( horizontal stretch, +20%, vertical stretch -28%,  
resizer to add black borders on top/bottom )
768 x 324 = 21 : 9  ( horizontal stretch +20%, vertical -44%, resizer to 
add ..... )

NTSC, 640 x 480

640 x 480 = 4 : 3    ( horizontal stretch, +0%, vertical stretch -17% )
640 x 342 = 16 : 9  ( horizontal stretch, +0%, vertical stretch -40%,  
resizer to add black borders on top/bottom )
640 x 272 = 21 : 9  ( horizontal stretch +0%, vertical -53%, resizer to 
add ..... )

As you can see from the 2 examples above, there is no regularity in the 
numbers above, very often the resizer would even have to stretch the 
picture by uneven numbers ( in % ), or to shrink it even ( bad ). Now, 
preferably our list of recommended encoding resolutions should be made 
such that there is a certain mathematical correlation between them, in 
order to allow using a limited number of resizing algorithms, if this is 
possible, and still meet at least with a MOD 8 criterium, preferably MOD 
16. As this is a new situation now ( adding black borders would have 
made the whole thing much much easier to handle ;-) ), i will have to 
sit down and try to find a number of different possible resolutions that 
will make the job for the hardware resizers as simply as possible. Of 
course, it would help a lot if somebody with a background in resizing 
algo's ( trbarry ? ) would give me some hints first about what makes a 
resizing job easy or not for a hardware device ( hint, hint :-) ). I 
will inform you guys if i manage to come up with something sensible .....

Sorry for the long email and the inevitable, imminent matroska 
advertising in it.

Regards

Christian