[XviD-devel] [PATCH] into CVS now

Edouard Gomez xvid-devel@xvid.org
Fri, 17 Jan 2003 14:37:58 +0100 (CET)


En réponse à "Marco \"elcabesa\" Belli" <elcabesa@inwind.it>:
> ok after your commit i had tested it nad with this configuration it make
> wrong file with artifact in moving object
> 
> mencoder -dvd 1  -oac mp3lame -lameopts br=128:vbr=3 -ovc xvid \
> -xvidencopts
fixed_quant=3:me_quality=6:max_bframes=4:bquant_ratio=120:gmc=1:me_colour=1:lumi_mask=1:4mv=1
\
> -vop scale=640:264,crop=710:420:5:78 -endpos 1:31:25 -sws 2 -alang it -o
prova.avi \
> 
> these mean xvid usign fixed quant, bframes 4:120:0  gmc ,chromaMe, lumi
> mask  and 4mv
>

My patch does not touch how features work. It modify the way we code final 
coefficients with smaller lookup tables for variable length encoding. 

> simply disabling gmc it suddenly work=)

Then the bug is probably in GMC

> what do you mean with
> "Much lighter VLC implementation (saves >6MB)" log? and what are LUT?

VLC = Variable Length Coding

This is the final comnpression stage, when writing to bitsream the 8x8
blocks we write each coefficient as a (last, run, level) triplet. 'last' is
1 bit long and tells the decoder if it's the final non zero coefficient.
'run' is a runlength that tells the decoder how much zeros are preceding this
non zero code. 'level' is the value of the encoded block element.

To speedup the variable length coding we use Look Up Tables (the LUTs) that
we address using a 3d address, as you might guess, we address then using run,
last and level coordinates. The old implemetation was using >10MB LUTs on my
system (because i was aligned on 32bit boundaries all the structures)

The new implementation uses much smaller LUTs that are 64KB long. We hoped in a 
first time to save bandwidth at the expense of a bit more CPU instructions. Big
LUTs were probably killing the performance of D2 cache which is generally 128KB 
or 256KB. As i posted a few weeks ago, the switch from big LUTs to small LUTs
helped in D2 cache saving passing from 0.9% of misses to 0.1% (according to 
valgrind cache sinmulations). But the overall performance slowed down a bit (1%)

I called for comments and it seemed clear that everyone agreed on using small
LUTs to save memory. It seems logical that when using 0.9.0 (or dev-api-3 since
the patch is in CVS) it is stupid to loose 10MB for LUTs, while frames buffers
will occupy approximatively the same amount of space. With dev-api-3 the 
situation is even worst because we buffer more frames because of bframes. Not 
using small LUTs would have meant that we deliberately want to waste RAM to gain
1% overall gain speed.

--
Edouard Gomez