[XviD-devel] XVID profiling

Christoph Lampert chl at math.uni-bonn.de
Sat Mar 1 15:42:07 CET 2003


Hi,
I got some profiling results about XVID for those you are interested in
MMXing a little more. So far, I just checked encoding. From the logfile
you can see: 

With MMX, it's always the SAD that is slowest, either sad16v_mmx because 
of INTER4V-mode, or sad16bi_mmx because of b-frames interpolate/direct
mode. Only CheckCandidates-Routines in motion-estimation seem like
candidate for some speedup. They've indeed grown rather large. 

GOAL 0)   Clean up "CheckCandidate"-mechanism  (but that may influence 
          ME structure, so it's not #1 on the list). 


With XMM, all SADs are faster than with mmx. CheckCandidate gets
relatively more influence, in particular in Bframe mode. Without B-frames
and Q-pel, mem transfer and interpolation become more important. 



GOAL 1)  Speed up Mem-Transfers, in particular transfer_8to16sub (_mmx)  
         and yv12_to_yv12 (_xmm). Maybe those are candidates for prefetch. 



For QPel, it become obvious that not everything is ASMed yet: 
interpolate16x16_lowpass_h_c and interpolate16x16_lowpass_v_c
are obvious candidated for ASMing: 


GOAL 2)  Create SIMDed versions of interpolate16x16_lowpass_h_c 
         and interpolate16x16_lowpass_v_c


also, interpolate-average functions take quite a lot of time and seem to
be mmx, not xmm. 



GOAL 3) Create XMM versions of interpolate8x8_avg4_mmx
                               interpolate8x8_6tap_lowpass_v_mmx
                               interpolate8x8_avg2_mmx



-------------- next part --------------
---no bframes, no Qpel -----------

  %     self			  self      	 
 time  seconds    calls  us/call name	 
 16.36    0.89  				 sad16v_mmx
  8.27    0.45  				 sad16_mmx
  6.99    0.38  				 interpolate8x8_halfpel_h_mmx
  6.43    0.35  				 sad8_mmx
  5.51    0.30  				 interpolate8x8_halfpel_hv_mmx
  5.33    0.29  				 transfer_8to16sub_mmx
  4.23    0.23  				 yv12_to_yv12_mmx
  4.04    0.22    43643 	5.04 SearchP
  3.86    0.21   811072 	0.26 CheckCandidate8
  3.86    0.21  	 11 19090.91 image_create
  3.49    0.19    38715 	4.91 CodeBlockInter
  3.49    0.19  				 fdct_mmx
  2.94    0.16   558781 	0.29 CheckCandidate16
  2.94    0.16  	 49  3265.31 FrameCodeP
  2.76    0.15  				 interpolate8x8_halfpel_v_mmx
  2.02    0.11   101384 	1.08 Search8


---2 bframes, no Qpel -----------

  %     self			  self      	 
 time  seconds    calls  us/call name	 
 26.36    2.38  				 sad16bi_mmx
 14.29    1.29  				 sad16_mmx
  6.42    0.58  				 interpolate8x8_halfpel_hv_mmx
  5.43    0.49  				 interpolate8x8_halfpel_h_mmx
  3.88    0.35   621052 	0.56 CheckCandidateInt
  3.77    0.34   924837 	0.37 CheckCandidate16no4v
  3.10    0.28  	 15 18666.67 image_create
  2.99    0.27  				 sad16v_mmx
  2.88    0.26   371282 	0.70 CheckCandidateDirectno4v
  2.88    0.26  				 interpolate8x8_halfpel_v_mmx
  2.33    0.21  				 yv12_to_yv12_mmx
  2.10    0.19  				 fdct_mmx

---no bframes , QPel -------------

  %     self			  self      	 
 time  seconds    calls  us/call name	 
 15.76    1.76  				 sad16v_mmx
  9.58    1.07  				 interpolate8x8_avg4_mmx
  7.34    0.82  2092385 	0.39 CheckCandidate16
  6.09    0.68  				 interpolate8x8_6tap_lowpass_v_mmx
  5.10    0.57  1702784 	0.33 CheckCandidate8
  4.39    0.49   849660 	0.58 Interpolate8x8qpel
  4.30    0.48  				 interpolate8x8_avg2_mmx
  4.12    0.46    23198    19.83 interpolate16x16_lowpass_v_c
  4.12    0.46  				 sad16_mmx
  3.67    0.41  				 interpolate8x8_6tap_lowpass_h_mmx
  3.49    0.39    22136    17.62 interpolate16x16_lowpass_h_c
  3.49    0.39  				 sad8_mmx
  2.86    0.32    43974 	7.28 SearchP
  2.60    0.29   350976 	0.83 Interpolate16x16qpel
  2.60    0.29  				 fdct_mmx
  2.33    0.26    38549 	6.74 CodeBlockInter


----2 Bframes , QPel --------------

  %     self			  self      	 
 time  seconds    calls  us/call name	 
 15.23    2.92  				 sad16bi_mmx
 12.62    2.42  				 interpolate8x8_avg4_mmx
  8.82    1.69  				 interpolate8x8_avg2_mmx
  8.76    1.68  				 sad16_mmx
  5.84    1.12  				 interpolate8x8_6tap_lowpass_v_mmx
  5.43    1.04  2228765 	0.47 Interpolate16x16qpel
  5.37    1.03  1637052 	0.63 CheckCandidate16no4v
  4.33    0.83  				 sad16v_mmx
  4.17    0.80  				 interpolate8x8_6tap_lowpass_h_mmx
  2.45    0.47    79254 	5.93 interpolate8x8_lowpass_v_c
  2.40    0.46   951651 	0.48 CheckCandidateInt
  2.24    0.43   971364 	0.44 CheckCandidate16
  2.03    0.39   500954 	0.78 CheckCandidateDirectno4v
-------------- next part --------------
--- no bframes no qpel ----

  %     self			  self      	 
 time  seconds    calls  us/call name	 
 13.59    0.59  				 sad16v_xmm
  7.14    0.31  				 sad16_xmm
  6.91    0.30  				 interpolate8x8_halfpel_h_xmm
  5.76    0.25  1246016 	0.20 CheckCandidate8
  5.76    0.25  				 transfer_8to16sub_mmx
  5.76    0.25  				 yv12_to_yv12_xmm
  5.53    0.24   793013 	0.30 CheckCandidate16
  5.30    0.23  				 interpolate8x8_halfpel_hv_xmm
  4.84    0.21  				 fdct_mmx
  4.38    0.19  				 sad8_xmm
  3.69    0.16    62169 	2.57 SearchP
  3.46    0.15    60019 	2.50 CodeBlockInter
  3.00    0.13  	 69  1884.06 FrameCodeP
  3.00    0.13  	 11 11818.18 image_create
  2.76    0.12   155752 	0.77 Search8
  2.76    0.12  				 interpolate8x8_halfpel_v_xmm
  2.07    0.09  	 69  1304.35 image_interpolate

--- 2 bframes no qpel ----

  %     self			  self      	 
 time  seconds    calls  us/call name	 
 12.15    0.73  				 sad16_xmm
 12.15    0.73  				 sad16bi_xmm
  9.82    0.59  				 interpolate8x8_halfpel_h_xmm
  5.66    0.34  1305381 	0.26 CheckCandidate16no4v
  5.66    0.34  				 interpolate8x8_halfpel_hv_xmm
  5.32    0.32   797301 	0.40 CheckCandidateInt
  4.16    0.25  				 sad16v_xmm
  3.99    0.24   542728 	0.44 CheckCandidateDirectno4v
  3.99    0.24  				 fdct_mmx
  3.33    0.20  				 yv12_to_yv12_xmm
  3.00    0.18  	113  1592.92 image_interpolate
  3.00    0.18  	 15 12000.00 image_create
  2.66    0.16    80842 	1.98 SearchBF

--- no bframes , qpel ----

  %     self			  self      	 
 time  seconds    calls  us/call name	 
 12.06    1.07  				 sad16v_xmm
  9.02    0.80  2947722 	0.27 CheckCandidate16
  7.67    0.68  				 interpolate8x8_6tap_lowpass_v_mmx
  6.09    0.54  				 interpolate8x8_avg4_mmx
  5.98    0.53    38312    13.83 interpolate16x16_lowpass_h_c
  5.52    0.49    42446    11.54 interpolate16x16_lowpass_v_c
  5.41    0.48  2717632 	0.18 CheckCandidate8
  4.85    0.43  				 interpolate8x8_6tap_lowpass_h_mmx
  3.61    0.32  1357487 	0.24 Interpolate8x8qpel
  3.27    0.29  				 sad16_xmm
  3.16    0.28  				 sad8_xmm
  2.82    0.25    62169 	4.02 SearchP
  2.82    0.25  				 interpolate8x8_avg2_mmx
  2.82    0.25  				 yv12_to_yv12_xmm
  2.59    0.23   497036 	0.46 Interpolate16x16qpel
  2.25    0.20  				 fdct_mmx
  2.14    0.19    60256 	3.15 CodeBlockInter


--- 2 bframes , qpel ----

  %     self			  self      	 
 time  seconds    calls  us/call name	 
 14.13    1.93  				 interpolate8x8_avg4_mmx
  9.81    1.34  				 interpolate8x8_6tap_lowpass_v_mmx
  7.47    1.02  				 interpolate8x8_avg2_mmx
  7.10    0.97  				 sad16bi_xmm
  7.03    0.96  				 sad16_xmm
  6.00    0.82  3317874 	0.25 Interpolate16x16qpel
  4.03    0.55  				 interpolate8x8_6tap_lowpass_h_mmx
  3.88    0.53  2255011 	0.24 CheckCandidate16no4v
  3.66    0.50  				 sad16v_xmm
  3.44    0.47   147554 	3.19 interpolate8x8_lowpass_v_c
  3.00    0.41  1323822 	0.31 CheckCandidateInt
  2.93    0.40   135325 	2.96 interpolate8x8_lowpass_h_c
  2.49    0.34  1313279 	0.26 CheckCandidate16



More information about the XviD-devel mailing list