[XviD-devel] XVID profiling
Christoph Lampert
chl at math.uni-bonn.de
Mon Mar 3 22:42:31 CET 2003
Btw.
is there any software out there which uses other colorspaces than
I420(YV12) or RGB24/16 for _input_ material?
And does windows decoding (video player) include a step like yv12_to_yv12
or is the image directly overlayed from internal buffers (as MPlayer does
using CSP_USER)?
gruel
On Mon, 3 Mar 2003, Klaus Post (KPO) wrote:
> I've been doing some work with your color conversion routines, and I
> noticed some minor details.
>
> YUY2 to YV12 conversion.
> You might be able to squeeze out a tiny bit of performance by doing an
> integer see version.
> The chroma could be averaged, by using pavgb instead of unpacking to
> words, add and divide by bitshifting. You actually should also do
> rounding before bitshifting, as you are currently doing:
>
> chroma = (upper line + lower line) >> 1
>
> and not:
>
> chroma = (upper line + lower line + 1 ) >> 1
>
> pavgb does have "proper" rounding.
>
> Furthermore, the routine might have a minor gain from writing using
> movntq.
>
> YV12 -> YUY2
> ------------
> The output from YUY2 mode would be nicer, if you interpolated chroma to
> the line above or below, and not simply copied it. This will of course
> make the routing more complex, and probably slightly slower. It ensures
> more correct chroma placement, but will slightly blur chroma after
> several conversions.
>
>
> Many people still deliver YUY2, when encoding with XviD, and some people
> also use YUY2 for output (because of overlay). So these things might
> have influence on a lot of people.
>
>
> Regards, Klaus Post
> AviSynth project
>
>
> -----Original Message-----
> From: Christoph Lampert [mailto:chl at math.uni-bonn.de]
> Sent: 1. marts 2003 15:42
> To: xvid-devel at xvid.org
> Subject: [XviD-devel] XVID profiling
>
>
> Hi,
> I got some profiling results about XVID for those you are interested in
> MMXing a little more. So far, I just checked encoding. From the logfile
> you can see:
>
> With MMX, it's always the SAD that is slowest, either sad16v_mmx because
>
> of INTER4V-mode, or sad16bi_mmx because of b-frames interpolate/direct
> mode. Only CheckCandidates-Routines in motion-estimation seem like
> candidate for some speedup. They've indeed grown rather large.
>
> GOAL 0) Clean up "CheckCandidate"-mechanism (but that may influence
> ME structure, so it's not #1 on the list).
>
>
> With XMM, all SADs are faster than with mmx. CheckCandidate gets
> relatively more influence, in particular in Bframe mode. Without
> B-frames
> and Q-pel, mem transfer and interpolation become more important.
>
>
>
> GOAL 1) Speed up Mem-Transfers, in particular transfer_8to16sub (_mmx)
>
> and yv12_to_yv12 (_xmm). Maybe those are candidates for
> prefetch.
>
>
>
> For QPel, it become obvious that not everything is ASMed yet:
> interpolate16x16_lowpass_h_c and interpolate16x16_lowpass_v_c
> are obvious candidated for ASMing:
>
>
> GOAL 2) Create SIMDed versions of interpolate16x16_lowpass_h_c
> and interpolate16x16_lowpass_v_c
>
>
> also, interpolate-average functions take quite a lot of time and seem to
> be mmx, not xmm.
>
>
>
> GOAL 3) Create XMM versions of interpolate8x8_avg4_mmx
> interpolate8x8_6tap_lowpass_v_mmx
> interpolate8x8_avg2_mmx
>
>
>
>
> _______________________________________________
> XviD-devel mailing list
> XviD-devel at xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
>
More information about the XviD-devel
mailing list