[XviD-devel] Question about bvop decoding
Christoph Lampert
chl at math.uni-bonn.de
Tue Jul 20 17:22:46 CEST 2004
On Tue, 20 Jul 2004, Edouard Gomez wrote:
> Hmm i don't have such a profile at all...
> See:
> http://ed.gomez.free.fr/vrac/profile-gprof.txt
> http://ed.gomez.free.fr/vrac/profile-oprofile.txt
valgrind/cachegrind seems to produce results similar to yours,
decode_bf_interpolate_mbinter has 14% of instructions, and
5.3% of total CPU cycles. With both, it's top of the list, followed by
decoder_bframes (7.3% of instructions) and decode_mbinter(6.8%).
The largest portion is due to complicated calculation of
const uint8_t *const src = refn + (int)((y+(dy>>1))*stride+x+(dx>>1)
and the less complicated
uint8_t *const dst = cur + (int)(y*stride+x);
switch (((dx&1)<<1)+(dy&1)) {
Those are in fact not in decoder.c, but inlined from
interpolate8x8_switch(), which is called 6 times per MB.
So I guess that high number of cycles is due to counting inlined code.
Have you maybe checked how big interpolate_mbinter is in the ASM step?
Indeed, this way of calling interpolate8x8_switch isn't optimal.
I guess, since each time all addresses are recalculated, and in most
cases, the vectors will not even be different.
chl
More information about the XviD-devel
mailing list