[XviD-devel] XVID profiling
Christoph Lampert
chl at math.uni-bonn.de
Sat Mar 1 15:42:07 CET 2003
Hi,
I got some profiling results about XVID for those you are interested in
MMXing a little more. So far, I just checked encoding. From the logfile
you can see:
With MMX, it's always the SAD that is slowest, either sad16v_mmx because
of INTER4V-mode, or sad16bi_mmx because of b-frames interpolate/direct
mode. Only CheckCandidates-Routines in motion-estimation seem like
candidate for some speedup. They've indeed grown rather large.
GOAL 0) Clean up "CheckCandidate"-mechanism (but that may influence
ME structure, so it's not #1 on the list).
With XMM, all SADs are faster than with mmx. CheckCandidate gets
relatively more influence, in particular in Bframe mode. Without B-frames
and Q-pel, mem transfer and interpolation become more important.
GOAL 1) Speed up Mem-Transfers, in particular transfer_8to16sub (_mmx)
and yv12_to_yv12 (_xmm). Maybe those are candidates for prefetch.
For QPel, it become obvious that not everything is ASMed yet:
interpolate16x16_lowpass_h_c and interpolate16x16_lowpass_v_c
are obvious candidated for ASMing:
GOAL 2) Create SIMDed versions of interpolate16x16_lowpass_h_c
and interpolate16x16_lowpass_v_c
also, interpolate-average functions take quite a lot of time and seem to
be mmx, not xmm.
GOAL 3) Create XMM versions of interpolate8x8_avg4_mmx
interpolate8x8_6tap_lowpass_v_mmx
interpolate8x8_avg2_mmx
-------------- next part --------------
---no bframes, no Qpel -----------
% self self
time seconds calls us/call name
16.36 0.89 sad16v_mmx
8.27 0.45 sad16_mmx
6.99 0.38 interpolate8x8_halfpel_h_mmx
6.43 0.35 sad8_mmx
5.51 0.30 interpolate8x8_halfpel_hv_mmx
5.33 0.29 transfer_8to16sub_mmx
4.23 0.23 yv12_to_yv12_mmx
4.04 0.22 43643 5.04 SearchP
3.86 0.21 811072 0.26 CheckCandidate8
3.86 0.21 11 19090.91 image_create
3.49 0.19 38715 4.91 CodeBlockInter
3.49 0.19 fdct_mmx
2.94 0.16 558781 0.29 CheckCandidate16
2.94 0.16 49 3265.31 FrameCodeP
2.76 0.15 interpolate8x8_halfpel_v_mmx
2.02 0.11 101384 1.08 Search8
---2 bframes, no Qpel -----------
% self self
time seconds calls us/call name
26.36 2.38 sad16bi_mmx
14.29 1.29 sad16_mmx
6.42 0.58 interpolate8x8_halfpel_hv_mmx
5.43 0.49 interpolate8x8_halfpel_h_mmx
3.88 0.35 621052 0.56 CheckCandidateInt
3.77 0.34 924837 0.37 CheckCandidate16no4v
3.10 0.28 15 18666.67 image_create
2.99 0.27 sad16v_mmx
2.88 0.26 371282 0.70 CheckCandidateDirectno4v
2.88 0.26 interpolate8x8_halfpel_v_mmx
2.33 0.21 yv12_to_yv12_mmx
2.10 0.19 fdct_mmx
---no bframes , QPel -------------
% self self
time seconds calls us/call name
15.76 1.76 sad16v_mmx
9.58 1.07 interpolate8x8_avg4_mmx
7.34 0.82 2092385 0.39 CheckCandidate16
6.09 0.68 interpolate8x8_6tap_lowpass_v_mmx
5.10 0.57 1702784 0.33 CheckCandidate8
4.39 0.49 849660 0.58 Interpolate8x8qpel
4.30 0.48 interpolate8x8_avg2_mmx
4.12 0.46 23198 19.83 interpolate16x16_lowpass_v_c
4.12 0.46 sad16_mmx
3.67 0.41 interpolate8x8_6tap_lowpass_h_mmx
3.49 0.39 22136 17.62 interpolate16x16_lowpass_h_c
3.49 0.39 sad8_mmx
2.86 0.32 43974 7.28 SearchP
2.60 0.29 350976 0.83 Interpolate16x16qpel
2.60 0.29 fdct_mmx
2.33 0.26 38549 6.74 CodeBlockInter
----2 Bframes , QPel --------------
% self self
time seconds calls us/call name
15.23 2.92 sad16bi_mmx
12.62 2.42 interpolate8x8_avg4_mmx
8.82 1.69 interpolate8x8_avg2_mmx
8.76 1.68 sad16_mmx
5.84 1.12 interpolate8x8_6tap_lowpass_v_mmx
5.43 1.04 2228765 0.47 Interpolate16x16qpel
5.37 1.03 1637052 0.63 CheckCandidate16no4v
4.33 0.83 sad16v_mmx
4.17 0.80 interpolate8x8_6tap_lowpass_h_mmx
2.45 0.47 79254 5.93 interpolate8x8_lowpass_v_c
2.40 0.46 951651 0.48 CheckCandidateInt
2.24 0.43 971364 0.44 CheckCandidate16
2.03 0.39 500954 0.78 CheckCandidateDirectno4v
-------------- next part --------------
--- no bframes no qpel ----
% self self
time seconds calls us/call name
13.59 0.59 sad16v_xmm
7.14 0.31 sad16_xmm
6.91 0.30 interpolate8x8_halfpel_h_xmm
5.76 0.25 1246016 0.20 CheckCandidate8
5.76 0.25 transfer_8to16sub_mmx
5.76 0.25 yv12_to_yv12_xmm
5.53 0.24 793013 0.30 CheckCandidate16
5.30 0.23 interpolate8x8_halfpel_hv_xmm
4.84 0.21 fdct_mmx
4.38 0.19 sad8_xmm
3.69 0.16 62169 2.57 SearchP
3.46 0.15 60019 2.50 CodeBlockInter
3.00 0.13 69 1884.06 FrameCodeP
3.00 0.13 11 11818.18 image_create
2.76 0.12 155752 0.77 Search8
2.76 0.12 interpolate8x8_halfpel_v_xmm
2.07 0.09 69 1304.35 image_interpolate
--- 2 bframes no qpel ----
% self self
time seconds calls us/call name
12.15 0.73 sad16_xmm
12.15 0.73 sad16bi_xmm
9.82 0.59 interpolate8x8_halfpel_h_xmm
5.66 0.34 1305381 0.26 CheckCandidate16no4v
5.66 0.34 interpolate8x8_halfpel_hv_xmm
5.32 0.32 797301 0.40 CheckCandidateInt
4.16 0.25 sad16v_xmm
3.99 0.24 542728 0.44 CheckCandidateDirectno4v
3.99 0.24 fdct_mmx
3.33 0.20 yv12_to_yv12_xmm
3.00 0.18 113 1592.92 image_interpolate
3.00 0.18 15 12000.00 image_create
2.66 0.16 80842 1.98 SearchBF
--- no bframes , qpel ----
% self self
time seconds calls us/call name
12.06 1.07 sad16v_xmm
9.02 0.80 2947722 0.27 CheckCandidate16
7.67 0.68 interpolate8x8_6tap_lowpass_v_mmx
6.09 0.54 interpolate8x8_avg4_mmx
5.98 0.53 38312 13.83 interpolate16x16_lowpass_h_c
5.52 0.49 42446 11.54 interpolate16x16_lowpass_v_c
5.41 0.48 2717632 0.18 CheckCandidate8
4.85 0.43 interpolate8x8_6tap_lowpass_h_mmx
3.61 0.32 1357487 0.24 Interpolate8x8qpel
3.27 0.29 sad16_xmm
3.16 0.28 sad8_xmm
2.82 0.25 62169 4.02 SearchP
2.82 0.25 interpolate8x8_avg2_mmx
2.82 0.25 yv12_to_yv12_xmm
2.59 0.23 497036 0.46 Interpolate16x16qpel
2.25 0.20 fdct_mmx
2.14 0.19 60256 3.15 CodeBlockInter
--- 2 bframes , qpel ----
% self self
time seconds calls us/call name
14.13 1.93 interpolate8x8_avg4_mmx
9.81 1.34 interpolate8x8_6tap_lowpass_v_mmx
7.47 1.02 interpolate8x8_avg2_mmx
7.10 0.97 sad16bi_xmm
7.03 0.96 sad16_xmm
6.00 0.82 3317874 0.25 Interpolate16x16qpel
4.03 0.55 interpolate8x8_6tap_lowpass_h_mmx
3.88 0.53 2255011 0.24 CheckCandidate16no4v
3.66 0.50 sad16v_xmm
3.44 0.47 147554 3.19 interpolate8x8_lowpass_v_c
3.00 0.41 1323822 0.31 CheckCandidateInt
2.93 0.40 135325 2.96 interpolate8x8_lowpass_h_c
2.49 0.34 1313279 0.26 CheckCandidate16
More information about the XviD-devel
mailing list