[XviD-devel] Profilign XVID, Part II

Christoph Lampert chl at math.uni-bonn.de
Sat Mar 1 17:25:53 CET 2003


Hi, now the decoding part: 

Without Qpel, MMX and XMM are rather similar: 
yv12_to_yv12 and transfer8x8_copy dominate 

GOAL 4)  _again_ try to sqeeze the last out of  yv12_to_yv12 
         and transfer-functions. 

With QPel, it's the interpolation again. 

GOAL 5) _again_ try to create (partial?) SIMD version of  
        interpolate16x16_lowpass_v_c
        interpolate16x16_lowpass_h_c
        interpolate8x8_lowpass_v_c
        interpolate8x8_lowpass_h_c


It seems that with those we would optimize encoder and decoder at the same
time, a rare opportunity. 

gruel 

-------------- next part --------------
-------- no Bframes , no Qpel

  %     self			  self      	  
 time  seconds    calls  us/call  name    
 17.96    0.44  				  yv12_to_yv12_mmx
 16.33    0.40  				  transfer8x8_copy_mmx
 11.84    0.29  				  interpolate8x8_halfpel_hv_mmx
  9.80    0.24    79948 	3.00  decoder_mbinter
  6.94    0.17  				  interpolate8x8_halfpel_h_mmx
  5.71    0.14   151602 	0.92  get_inter_block
  5.31    0.13  				  idct_mmx
  4.90    0.12  	  6 20000.00  image_create
  4.49    0.11  	 99  1111.11  decoder_pframe
  3.27    0.08  				  interpolate8x8_halfpel_v_mmx
  2.04    0.05   162000 	0.31  check_resync_marker
  2.04    0.05  				  dequant_inter_mmx


-------- 2 Bframes , no Qpel

  %     self			  self   		   
 time  seconds    calls  us/call  name    
 17.67    0.50  				  transfer8x8_copy_mmx
 16.96    0.48  				  yv12_to_yv12_mmx
  7.77    0.22  				  interpolate8x8_halfpel_h_mmx
  7.77    0.22  				  interpolate8x8_halfpel_hv_mmx
  7.42    0.21  	 66  3181.82  decoder_bframe
  4.95    0.14    44069 	3.18  decoder_bf_interpolate_mbinter
  3.89    0.11  	  6 18333.33  image_create
  3.89    0.11  				  interpolate8x8_halfpel_v_mmx
  3.18    0.09  	165   545.45  image_setedges
  3.18    0.09  				  idct_mmx
  3.18    0.09  				  interpolate8x8_avg2_mmx
  2.83    0.08    75734 	1.06  get_inter_block
  2.83    0.08    62851 	1.27  decoder_bf_mbinter
  2.83    0.08    22881 	3.50  decoder_mbinter
  2.12    0.06    43608 	1.38  predict_acdc


-------- no Bframes , Qpel


  %     self			  self      	  
 time  seconds    calls  us/call  name    
 22.40    0.97    48816    19.87  interpolate16x16_lowpass_v_c
 20.32    0.88    47877    18.38  interpolate16x16_lowpass_h_c
 11.78    0.51  				  yv12_to_yv12_mmx
  8.31    0.36  				  transfer8x8_copy_mmx
  5.77    0.25    79297 	3.15  decoder_mbinter
  3.23    0.14  				  interpolate8x8_avg2_mmx
  3.00    0.13     5895    22.05  interpolate16x16_lowpass_hv_c
  3.00    0.13  	 99  1313.13  decoder_pframe
  3.00    0.13  				  dequant_inter_mmx
  2.77    0.12  	  6 20000.00  image_create
  2.54    0.11  				  interpolate8x8_halfpel_hv_mmx
  2.31    0.10  				  idct_mmx


-------- 2 Bframes , Qpel


  %      self			   self 			
 time   seconds    calls  us/call   name	
 15.01     0.83   179637	 4.62   interpolate8x8_lowpass_v_c
 13.02     0.72   138353	 5.20   interpolate8x8_lowpass_h_c
  8.86     0.49    45350	10.80   decoder_bf_interpolate_mbinter
  8.14     0.45 					yv12_to_yv12_mmx
  6.87     0.38    19321	19.67   interpolate16x16_lowpass_v_c
  6.69     0.37    16654	22.22   interpolate16x16_lowpass_h_c
  6.69     0.37 					transfer8x8_copy_mmx
  4.52     0.25 					interpolate8x8_avg2_mmx
  3.62     0.20 					interpolate8x8_halfpel_hv_mmx
  3.44     0.19    39827	 4.77   interpolate8x8_lowpass_hv_c
  3.07     0.17 	  66  2575.76   decoder_bframe
  2.53     0.14    61570	 2.27   decoder_bf_mbinter
  2.17     0.12 	  33  3636.36   decoder_pframe
-------------- next part --------------
-------- no Bframes , no Qpel


  %      self			   self 			
 time   seconds    calls  us/call   name	
 22.30     0.33 					yv12_to_yv12_xmm
 14.19     0.21 					transfer8x8_copy_mmx
 10.14     0.15    79948	 1.88   decoder_mbinter
  6.76     0.10 	  99  1010.10   decoder_pframe
  6.76     0.10 					interpolate8x8_halfpel_hv_xmm
  6.08     0.09 					interpolate8x8_halfpel_h_xmm
  5.41     0.08   151602	 0.53   get_inter_block
  4.73     0.07 	   6 11666.67   image_create
  4.05     0.06 	  99   606.06   image_setedges
  4.05     0.06 					interpolate8x8_halfpel_v_xmm
  2.70     0.04    84739	 0.47   get_motion_vector
  2.70     0.04 					idct_xmm
  2.03     0.03 					transfer_16to8add_mmx


-------- 2 Bframes , no Qpel

  %      self			   self 			
 time   seconds    calls  us/call   name	
 22.35     0.40 					transfer8x8_copy_mmx
 20.67     0.37 					yv12_to_yv12_xmm
  6.70     0.12    44069	 2.72   decoder_bf_interpolate_mbinter
  6.15     0.11 	 165   666.67   image_setedges
  6.15     0.11 					interpolate8x8_halfpel_hv_xmm
  4.47     0.08 	  66  1212.12   decoder_bframe
  3.91     0.07 	   6 11666.67   image_create
  3.91     0.07 					idct_xmm
  3.91     0.07 					interpolate8x8_halfpel_v_xmm
  3.35     0.06    62851	 0.95   decoder_bf_mbinter
  2.79     0.05 					interpolate8x8_halfpel_h_xmm
  2.23     0.04    75734	 0.53   get_inter_block


-------- no Bframes , Qpel


  %      self			   self 			
 time   seconds    calls  us/call   name	
 27.10     0.71    48816	14.54   interpolate16x16_lowpass_v_c
 14.50     0.38    47877	 7.94   interpolate16x16_lowpass_h_c
 11.83     0.31 					yv12_to_yv12_xmm
 11.07     0.29 					transfer8x8_copy_mmx
  5.73     0.15 	  99  1515.15   decoder_pframe
  4.96     0.13    79297	 1.64   decoder_mbinter
  2.67     0.07   148833	 0.47   get_inter_block
  2.67     0.07 					interpolate8x8_avg2_mmx
  2.67     0.07 					interpolate8x8_halfpel_hv_xmm
  2.29     0.06 	5895	10.18   interpolate16x16_lowpass_hv_c
  2.29     0.06 	   6 10000.00   image_create


-------- 2 Bframes ,  Qpel


  %      self			   self 			
 time   seconds    calls  us/call   name	
 19.09     0.63   179637	 3.51   interpolate8x8_lowpass_v_c
 15.15     0.50   138353	 3.61   interpolate8x8_lowpass_h_c
 11.21     0.37 					yv12_to_yv12_xmm
  8.18     0.27    16654	16.21   interpolate16x16_lowpass_h_c
  8.18     0.27 					transfer8x8_copy_mmx
  6.06     0.20    19321	10.35   interpolate16x16_lowpass_v_c
  3.03     0.10    45350	 2.21   decoder_bf_interpolate_mbinter
  3.03     0.10 	  66  1515.15   decoder_bframe
  2.73     0.09    39827	 2.26   interpolate8x8_lowpass_hv_c
  2.42     0.08 	 165   484.85   image_setedges
  2.12     0.07 	   6 11666.67   image_create
  2.12     0.07 					interpolate8x8_halfpel_hv_xmm


More information about the XviD-devel mailing list