[XviD-devel] Profilign XVID, Part II
Christoph Lampert
chl at math.uni-bonn.de
Sat Mar 1 17:25:53 CET 2003
Hi, now the decoding part:
Without Qpel, MMX and XMM are rather similar:
yv12_to_yv12 and transfer8x8_copy dominate
GOAL 4) _again_ try to sqeeze the last out of yv12_to_yv12
and transfer-functions.
With QPel, it's the interpolation again.
GOAL 5) _again_ try to create (partial?) SIMD version of
interpolate16x16_lowpass_v_c
interpolate16x16_lowpass_h_c
interpolate8x8_lowpass_v_c
interpolate8x8_lowpass_h_c
It seems that with those we would optimize encoder and decoder at the same
time, a rare opportunity.
gruel
-------------- next part --------------
-------- no Bframes , no Qpel
% self self
time seconds calls us/call name
17.96 0.44 yv12_to_yv12_mmx
16.33 0.40 transfer8x8_copy_mmx
11.84 0.29 interpolate8x8_halfpel_hv_mmx
9.80 0.24 79948 3.00 decoder_mbinter
6.94 0.17 interpolate8x8_halfpel_h_mmx
5.71 0.14 151602 0.92 get_inter_block
5.31 0.13 idct_mmx
4.90 0.12 6 20000.00 image_create
4.49 0.11 99 1111.11 decoder_pframe
3.27 0.08 interpolate8x8_halfpel_v_mmx
2.04 0.05 162000 0.31 check_resync_marker
2.04 0.05 dequant_inter_mmx
-------- 2 Bframes , no Qpel
% self self
time seconds calls us/call name
17.67 0.50 transfer8x8_copy_mmx
16.96 0.48 yv12_to_yv12_mmx
7.77 0.22 interpolate8x8_halfpel_h_mmx
7.77 0.22 interpolate8x8_halfpel_hv_mmx
7.42 0.21 66 3181.82 decoder_bframe
4.95 0.14 44069 3.18 decoder_bf_interpolate_mbinter
3.89 0.11 6 18333.33 image_create
3.89 0.11 interpolate8x8_halfpel_v_mmx
3.18 0.09 165 545.45 image_setedges
3.18 0.09 idct_mmx
3.18 0.09 interpolate8x8_avg2_mmx
2.83 0.08 75734 1.06 get_inter_block
2.83 0.08 62851 1.27 decoder_bf_mbinter
2.83 0.08 22881 3.50 decoder_mbinter
2.12 0.06 43608 1.38 predict_acdc
-------- no Bframes , Qpel
% self self
time seconds calls us/call name
22.40 0.97 48816 19.87 interpolate16x16_lowpass_v_c
20.32 0.88 47877 18.38 interpolate16x16_lowpass_h_c
11.78 0.51 yv12_to_yv12_mmx
8.31 0.36 transfer8x8_copy_mmx
5.77 0.25 79297 3.15 decoder_mbinter
3.23 0.14 interpolate8x8_avg2_mmx
3.00 0.13 5895 22.05 interpolate16x16_lowpass_hv_c
3.00 0.13 99 1313.13 decoder_pframe
3.00 0.13 dequant_inter_mmx
2.77 0.12 6 20000.00 image_create
2.54 0.11 interpolate8x8_halfpel_hv_mmx
2.31 0.10 idct_mmx
-------- 2 Bframes , Qpel
% self self
time seconds calls us/call name
15.01 0.83 179637 4.62 interpolate8x8_lowpass_v_c
13.02 0.72 138353 5.20 interpolate8x8_lowpass_h_c
8.86 0.49 45350 10.80 decoder_bf_interpolate_mbinter
8.14 0.45 yv12_to_yv12_mmx
6.87 0.38 19321 19.67 interpolate16x16_lowpass_v_c
6.69 0.37 16654 22.22 interpolate16x16_lowpass_h_c
6.69 0.37 transfer8x8_copy_mmx
4.52 0.25 interpolate8x8_avg2_mmx
3.62 0.20 interpolate8x8_halfpel_hv_mmx
3.44 0.19 39827 4.77 interpolate8x8_lowpass_hv_c
3.07 0.17 66 2575.76 decoder_bframe
2.53 0.14 61570 2.27 decoder_bf_mbinter
2.17 0.12 33 3636.36 decoder_pframe
-------------- next part --------------
-------- no Bframes , no Qpel
% self self
time seconds calls us/call name
22.30 0.33 yv12_to_yv12_xmm
14.19 0.21 transfer8x8_copy_mmx
10.14 0.15 79948 1.88 decoder_mbinter
6.76 0.10 99 1010.10 decoder_pframe
6.76 0.10 interpolate8x8_halfpel_hv_xmm
6.08 0.09 interpolate8x8_halfpel_h_xmm
5.41 0.08 151602 0.53 get_inter_block
4.73 0.07 6 11666.67 image_create
4.05 0.06 99 606.06 image_setedges
4.05 0.06 interpolate8x8_halfpel_v_xmm
2.70 0.04 84739 0.47 get_motion_vector
2.70 0.04 idct_xmm
2.03 0.03 transfer_16to8add_mmx
-------- 2 Bframes , no Qpel
% self self
time seconds calls us/call name
22.35 0.40 transfer8x8_copy_mmx
20.67 0.37 yv12_to_yv12_xmm
6.70 0.12 44069 2.72 decoder_bf_interpolate_mbinter
6.15 0.11 165 666.67 image_setedges
6.15 0.11 interpolate8x8_halfpel_hv_xmm
4.47 0.08 66 1212.12 decoder_bframe
3.91 0.07 6 11666.67 image_create
3.91 0.07 idct_xmm
3.91 0.07 interpolate8x8_halfpel_v_xmm
3.35 0.06 62851 0.95 decoder_bf_mbinter
2.79 0.05 interpolate8x8_halfpel_h_xmm
2.23 0.04 75734 0.53 get_inter_block
-------- no Bframes , Qpel
% self self
time seconds calls us/call name
27.10 0.71 48816 14.54 interpolate16x16_lowpass_v_c
14.50 0.38 47877 7.94 interpolate16x16_lowpass_h_c
11.83 0.31 yv12_to_yv12_xmm
11.07 0.29 transfer8x8_copy_mmx
5.73 0.15 99 1515.15 decoder_pframe
4.96 0.13 79297 1.64 decoder_mbinter
2.67 0.07 148833 0.47 get_inter_block
2.67 0.07 interpolate8x8_avg2_mmx
2.67 0.07 interpolate8x8_halfpel_hv_xmm
2.29 0.06 5895 10.18 interpolate16x16_lowpass_hv_c
2.29 0.06 6 10000.00 image_create
-------- 2 Bframes , Qpel
% self self
time seconds calls us/call name
19.09 0.63 179637 3.51 interpolate8x8_lowpass_v_c
15.15 0.50 138353 3.61 interpolate8x8_lowpass_h_c
11.21 0.37 yv12_to_yv12_xmm
8.18 0.27 16654 16.21 interpolate16x16_lowpass_h_c
8.18 0.27 transfer8x8_copy_mmx
6.06 0.20 19321 10.35 interpolate16x16_lowpass_v_c
3.03 0.10 45350 2.21 decoder_bf_interpolate_mbinter
3.03 0.10 66 1515.15 decoder_bframe
2.73 0.09 39827 2.26 interpolate8x8_lowpass_hv_c
2.42 0.08 165 484.85 image_setedges
2.12 0.07 6 11666.67 image_create
2.12 0.07 interpolate8x8_halfpel_hv_xmm
More information about the XviD-devel
mailing list