[XviD-devel] Request: optimized version of image_setedges

Michael Militzer xvid-devel@xvid.org
Sun, 7 Jul 2002 13:23:45 +0200


Hi,

> I just saw in profiling SMP that  image_setedges()  is one
> of the slowest parts in XviD now. I doubt that this is needed!
>
> I guess the reason is many loops and many calls to library functions
> memcpy/memset for very small memory blocks of 32 or even 16 bytes, which
> could be done by loop onrolling or MMX-copy much faster.
>
> There could be a fixed "copy 16 bytes by MMX" inlined function and
> something tricky for memset(), too. However, I don't know enough
> MMX/assembler for that.
>
> Anyone else?

I believe I did this already when I profiled the decoder using AMD's
CodeAnalyst. The code must be somewhere flying around on my hd. I also had
written a fast mmx memset replacement (which is a really easy task even
without knowing much about mmx...). It helps not only within the setedges
routine, but memset is also used to clear the coeff blocks while decoding
(decoder_mbintra and _inter I believe) and memset slows things down a bit
here.

I'll look if I can find the code again...


Michael