[XviD-devel] Request: optimized version of image_setedges

Christoph Lampert xvid-devel@xvid.org
Sun, 7 Jul 2002 13:53:43 +0200 (CEST)


On Sun, 7 Jul 2002, Michael Militzer wrote:

> Hi,
> 
> > I just saw in profiling SMP that  image_setedges()  is one
> > of the slowest parts in XviD now. I doubt that this is needed!
> >
> > I guess the reason is many loops and many calls to library functions
> > memcpy/memset for very small memory blocks of 32 or even 16 bytes, which
> > could be done by loop onrolling or MMX-copy much faster.
> >
> > There could be a fixed "copy 16 bytes by MMX" inlined function and
> > something tricky for memset(), too. However, I don't know enough
> > MMX/assembler for that.
> >
> > Anyone else?
> 
> I believe I did this already when I profiled the decoder using AMD's
> CodeAnalyst. The code must be somewhere flying around on my hd. I also had
> written a fast mmx memset replacement (which is a really easy task even
> without knowing much about mmx...). It helps not only within the setedges
> routine, but memset is also used to clear the coeff blocks while decoding
> (decoder_mbintra and _inter I believe) and memset slows things down a bit
> here.
> 
> I'll look if I can find the code again...

Fine. I was just looking around to see what could possibly be
multithreaded. I though of image_interpolate, because it used to be rather
slow, but now there doesn't seem to be a real #1 candidate at the moment.
Maybe I manage to put whole FrameCodeP into several threads...

Christoph 

-- 
Christoph H. Lampert chl@math.uni-bonn.de | Diese Signature wurde maschi-     
Beringstr. 6, Raum 14 Tel. (0228) 73-2948 | nell erstellt und bedarf
Sprechstunden: keine, aber meistens da    | keiner Unterschrift. AZ 27B-6