[XviD-devel] Re: Re: performance analysis results

Sun Mar 9 19:56:27 CET 2003

I mean the time, when I said cycles.
On Sun, 9 Mar 2003, Venkata Tumati wrote:
> Hi,
> I was doing some performance analysis using SGI Speedshop on xvid.
> 1st setup was using xvid_20021230 and the 2nd setup was the latest
> download from yesterday xvidcore-0.9.1. 

You mean the xvidcore-0.9.1 archive, available from from www.xvid.org?
Or some CVS version?

>>>>>Yes xvidcore-0.9.1 from xvid.org

> The results are pretty contrasting. Any way here is the summary. Also 
> I made some changes to xvid_encraw.c and xvid_decraw.c to encode 3 
> visual objects; I create three instances of the encoder and store the 
> results sequentially in a file. For decoding I use the final output 
> from the encoding of 3visual objects and decode it. I hope this is how

> multiple visual are encoded.

I'm pretty sure that it's not, because XVID doesn't support multiple
visual objects. You would have to change several MPEG-headers, not only
encode three time. 

>>>>>What do you mean can I get more information how to encode multiple
>>>>>visual objects, I don't want to do some wrong things and get
garbage >>>>>numbers

> All the tests were run on SGI R12k 300mhz don't know the exact cache 
> config but I think it has a 2MB L2 cache.

I found some CPU specs, maybe wrong, but: 

L1 cache size - 32 KB for instruction cache and 32 KB for data cache 
L1 cache line - 64 bytes for instruction cache and 32 bytes for data
cache 

L2 cache line - 128 bytes 

That's pretty weird... 32 byte cacheline for L1-cacheline on a 64bit
CPU??? But 128 for L2... 

If you are interested in the truth, you can test yourself with 
one of the many memory/latency benchmarks like the great

http://www.cwi.nl/~manegold/Calibrator/calibrator.shtml

Usually the peaks at L1- and L2- cache are very well visible, 
also the stride values which correspond to length of cacheline.

> Xvidcore-0.9.1:(look the total time for each config)
> [...]
> Does any body know why we are having more L1 cache misses, what 
> changes to the code are causing this. Is this due to some 
> optimizations when the code is compiled?

This slowdown is too much to be explained by 0.5% of extra L1
cache-misses. Most likely other options have changed, too, either
compiling or encoding/decoding. 

Maybe you can do a profiling run to tell which routine it was that
slowed down that much? 

gruel