[XviD-devel] XVID profiling

Mon Mar 3 22:42:31 CET 2003

Btw.

is there any software out there which uses other colorspaces than
I420(YV12) or RGB24/16 for _input_ material? 

And does windows decoding (video player) include a step like yv12_to_yv12
or is the image directly overlayed from internal buffers (as MPlayer does
using CSP_USER)? 

gruel 

On Mon, 3 Mar 2003, Klaus Post (KPO) wrote:

> I've been doing some work with your color conversion routines, and I
> noticed some minor details.
> 
> YUY2 to YV12 conversion.
> You might be able to squeeze out a tiny bit of performance by doing an
> integer see version. 
> The chroma could be averaged, by using pavgb instead of unpacking to
> words, add and divide by bitshifting. You actually should also do
> rounding before bitshifting, as you are currently doing:
> 
> chroma = (upper line + lower line) >> 1 
> 
> and not:
> 
> chroma = (upper line + lower line + 1 ) >> 1
> 
> pavgb does have "proper" rounding.
> 
> Furthermore, the routine might have a minor gain from writing using
> movntq.
> 
> YV12 -> YUY2
> ------------
> The output from YUY2 mode would be nicer, if you interpolated chroma to
> the line above or below, and not simply copied it.  This will of course
> make the routing more complex, and probably slightly slower. It ensures
> more correct chroma placement, but will slightly blur chroma after
> several conversions. 
> 
> 
> Many people still deliver YUY2, when encoding with XviD, and some people
> also use YUY2 for output (because of overlay). So these things might
> have influence on a lot of people.
> 
>  
> Regards, Klaus Post
> AviSynth project
>  
> 
> -----Original Message-----
> From: Christoph Lampert [mailto:chl at math.uni-bonn.de] 
> Sent: 1. marts 2003 15:42
> To: xvid-devel at xvid.org
> Subject: [XviD-devel] XVID profiling
> 
> 
> Hi,
> I got some profiling results about XVID for those you are interested in
> MMXing a little more. So far, I just checked encoding. From the logfile
> you can see: 
> 
> With MMX, it's always the SAD that is slowest, either sad16v_mmx because
> 
> of INTER4V-mode, or sad16bi_mmx because of b-frames interpolate/direct
> mode. Only CheckCandidates-Routines in motion-estimation seem like
> candidate for some speedup. They've indeed grown rather large. 
> 
> GOAL 0)   Clean up "CheckCandidate"-mechanism  (but that may influence 
>           ME structure, so it's not #1 on the list). 
> 
> 
> With XMM, all SADs are faster than with mmx. CheckCandidate gets
> relatively more influence, in particular in Bframe mode. Without
> B-frames
> and Q-pel, mem transfer and interpolation become more important. 
> 
> 
> 
> GOAL 1)  Speed up Mem-Transfers, in particular transfer_8to16sub (_mmx)
> 
>          and yv12_to_yv12 (_xmm). Maybe those are candidates for
> prefetch. 
> 
> 
> 
> For QPel, it become obvious that not everything is ASMed yet: 
> interpolate16x16_lowpass_h_c and interpolate16x16_lowpass_v_c
> are obvious candidated for ASMing: 
> 
> 
> GOAL 2)  Create SIMDed versions of interpolate16x16_lowpass_h_c 
>          and interpolate16x16_lowpass_v_c
> 
> 
> also, interpolate-average functions take quite a lot of time and seem to
> be mmx, not xmm. 
> 
> 
> 
> GOAL 3) Create XMM versions of interpolate8x8_avg4_mmx
>                                interpolate8x8_6tap_lowpass_v_mmx
>                                interpolate8x8_avg2_mmx
> 
> 
> 
> 
> _______________________________________________
> XviD-devel mailing list
> XviD-devel at xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
>