[XviD-devel] Question about bvop decoding
skal
skal at planet-d.net
Wed Jul 21 11:07:45 CEST 2004
Hi all,
On Tue, 2004-07-20 at 21:24, Edouard Gomez wrote:
> Christoph Lampert (chl at math.uni-bonn.de) wrote:
> > valgrind/cachegrind seems to produce results similar to yours,
> > decode_bf_interpolate_mbinter has 14% of instructions, and
> > 5.3% of total CPU cycles. With both, it's top of the list, followed by
> > decoder_bframes (7.3% of instructions) and decode_mbinter(6.8%).
>
> Glad to see i'm not crazy, and/or my box doesn't behave like
> being part of the 4th dimension !
>
> > The largest portion is due to complicated calculation of
> >
> > const uint8_t *const src = refn + (int)((y+(dy>>1))*stride+x+(dx>>1)
> >
> > and the less complicated
> >
> > uint8_t *const dst = cur + (int)(y*stride+x);
> >
> > switch (((dx&1)<<1)+(dy&1)) {
> >
> > Those are in fact not in decoder.c, but inlined from
> > interpolate8x8_switch(), which is called 6 times per MB.
> > So I guess that high number of cycles is due to counting inlined code.
> > Have you maybe checked how big interpolate_mbinter is in the ASM step?
This is most probably the bigger part, more than the above
calculations... Unfortunately, gprof can't instrument the
ASM code.
>
> I'm still amazed the CK kernel could bring 15% improvement for
> free (of course that implies you do nothing else but decoding)
Let's reverse the point of view: How could previous
kernel spend 15% of their doing counter-productive
things? :))
But back on topic:
Ed, come on, it's no need to use complicated profile
technics to see where interpolate mode could be
improved: for interp mode, you're doing:
a) fwd predict into buf1
b) bwd predict into buf2
c) average buf1 and buf2 and send to pic.
Now it's pretty obvious you're loosing
time in Memory I/O and 16/8bits conversion
during steps b) and c). These two could be
merged into a single 'averaging' bwd predicting
step.
Now, if you dig very hard into this mailing list's
archive, you may find that i once, a long time
ago, sent ASM code that does combined b)+c) steps...:)
bye!
Skal
More information about the XviD-devel
mailing list