[XviD-devel] mmx transfer8x8/16x16 optimization, setedges, etc.

Alban Bedel xvid-devel@xvid.org
Tue, 16 Jul 2002 03:31:52 +0200


Hi peter ross,

on Tue, 16 Jul 2002 10:02:38 +1000 you wrote:

> 
> 
> at the request of alban, ive assembled two new functions. i say  assembled 
> because it basically involved  copying and pasting that "skals" chap's code.
> xvid currently has a transfer8x8_copy_[c,mmx] function, which copy an 8x8 
> block from one buffer to another. this function is limited by the assumption 
> that both buffers have the same stride. the new functions, 
> transfer8x8copy_[c,mmx] and transfer8x8copy_[c,mmx] provide stride arguments 
> for both destination and source buffers.
> 
> NOW, on this machine (p3,800mhz) the new 8x8copy function is *1* cycle 
> slower than current 8x8_copy function. so i reckon we can replace it with 
> the new one (rather than maintaining two pieces of near identical code). any 
> opinions?
> 
> speedwise, the mmx8x8 copy is ~25% faster than the c/memcpy version. the 
> mmx16x16 copy is ~75% faster than the c/memcpy version.
> alban this should speed up your ouput_mb function a little. AND, the 
> 16x16copy can now replace the four 8x8copy's used to copy not-coded mbs (in 
> decoder_pframe function). please perform some tests alban and post the 
> results. Note: *I* will commit this code in a day-or-so. btw, the 16x16 
> could also be sse2'd.
Thx a lot. I did't hoped it so soon.

> alban and i were the possibility of "on-the-fly" set_edges for decoding. why 
> bother? well, there's two reasons.
> 1. potential for added decoding speed. i suspect that the edges are rarely 
> used, so by not calling image_setedges we might save some time.
> 2. support for mode-1 direct rending. in this mode, xvid uses the VIDEO-RAM 
> as it's internal reference-frame buffer. the problem is doing so is that 
> VIDEo-RAM has no room for edges (hence the need for on-the-fly).

Sorry but i'm quiet bad with explanations. I try to explain it better. Currently xvid
allocate it self his buffers and need to read from these buffers. Some video driver
use a buffer in RAM (thus readable) so if xvid where using this buffer instead that
would save 1 memcpy. But usely we can't have a buffer larger than the target
picture so there is no space for the edges.
The copy on-the-fly is used when the target buffer is in video RAM because there
you can't read (you can but it's slow as hell). Also when colorspace conversion is
needed it should take place there for better performance.
I hope that it's a bit clearer. 

As i mentioned in another mail here DOCS/tech/dr-methods.txt found in mplayer
sources might be a good read on this subject even if it's a bit mplayer centric ;)

> also, i've been sitting on ircnet#xvid a little bit lately. syskin & mf 
> asked that you, the dev team, drop by some time for a chin wag (that's slang 
> for CHAT).
> 
That would be nice ;)
	Albeu