[XviD-devel] [PATCH] Possible patches for decoder speedup

Edouard Gomez ed.gomez at free.fr
Sun May 16 14:08:37 CEST 2004


Hey hey,

I've been doing some profiling of the decoder, and it proves we sux at
decoding bframes, but to speed this up, it requires some hard work, so i
prefered looking at second ranked candidates for optim.

Obviously, there were two things that could save lot cycles w/o much
effort:
 - edging the ref frames only when required.
 - don't use 64bit arithmetic in direct coded bframe blocks

The first point is only a "never call a function twice on the same image"
optimization, it won't hurt much. But the second patch is somewhat
touching at critical code :) so i prefer asking first if there was a
really good reasons for TRD/TRB being 64bit data ?

Patches are attached, i'd like to have feedback on both.

Results on a 4min sequence:

no patches:
BENCHMARKs: VC:  35,539s VO:   0,038s A:   0,000s Sys:   2,103s =   37,681s

int32 patch alone:
BENCHMARKs: VC:  35,075s VO:   0,037s A:   0,000s Sys:   2,134s =   37,246s 

Both patches applied:
BENCHMARKs: VC:  32,364s VO:   0,038s A:   0,000s Sys:   2,175s = 34,577s

To show we wtill sux at decoding :-)
Libavcodec:
BENCHMARKs: VC:  22,334s VO:   0,042s A:   0,000s Sys:   2,435s =   24,812s

-- 
Edouard Gomez
-------------- next part --------------
CPU: Athlon, speed 1800.92 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 3000
Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 500
samples  %        samples  %        image name               symbol name
80522    12.0783  1616      7.4577  libxvidcore.so.4.0       decoder_decode
65451     9.8176  4666     21.5331  libxvidcore.so.4.0       decoder_bf_interpolate_mbinter
56616     8.4924  1409      6.5024  libxvidcore.so.4.0       get_coeff
49677     7.4515  2026      9.3498  libxvidcore.so.4.0       decoder_mbinter
43783     6.5674  138       0.6369  libxvidcore.so.4.0       idct_3dne
37315     5.5972  2347     10.8311  libxvidcore.so.4.0       interpolate8x8_halfpel_hv_3dne
31974     4.7961  274       1.2645  libxvidcore.so.4.0       predict_acdc
31381     4.7071  509       2.3490  libxvidcore.so.4.0       decoder_pframe
24449     3.6673  1125      5.1917  libxvidcore.so.4.0       interpolate8x8_switch
23504     3.5256  343       1.5829  libxvidcore.so.4.0       decoder_mbintra
19801     2.9701  635       2.9305  libxvidcore.so.4.0       transfer8x8_copy_3dne
18704     2.8056  818       3.7750  libxvidcore.so.4.0       image_setedges
15785     2.3677  959       4.4257  libxvidcore.so.4.0       interpolate8x8_halfpel_h_3dne
15338     2.3007  58        0.2677  libxvidcore.so.4.0       dequant_h263_intra_3dne
13877     2.0815  707       3.2627  libxvidcore.so.4.0       interpolate8x8_avg2_mmx.start1
13757     2.0635  488       2.2521  libxvidcore.so.4.0       interpolate8x8_halfpel_v_3dne
13054     1.9581  152       0.7015  libxvidcore.so.4.0       get_mv
9734      1.4601  642       2.9628  libxvidcore.so.4.0       interpolate8x8_halfpel_hv_3dne.rounding1
9053      1.3579  327       1.5091  libxvidcore.so.4.0       __i686.get_pc_thunk.bx
8896      1.3344  115       0.5307  libxvidcore.so.4.0       decoder_mb_decode
8597      1.2895  225       1.0383  libxvidcore.so.4.0       add_acdc
8449      1.2673  202       0.9322  libxvidcore.so.4.0       get_pmv2
7542      1.1313  120       0.5538  libxvidcore.so.4.0       dequant_h263_inter_3dne
6623      0.9934  426       1.9659  libxvidcore.so.4.0       transfer_16to8copy_3dne
5610      0.8415  22        0.1015  libxvidcore.so.4.0       get_dc_size_lum
4889      0.7333  299       1.3799  libxvidcore.so.4.0       interpolate8x8_halfpel_h_3dne.rounding1
4880      0.7320  80        0.3692  libxvidcore.so.4.0       check_resync_marker
4710      0.7065  21        0.0969  libxvidcore.so.4.0       get_inter_block
4703      0.7054  28        0.1292  libxvidcore.so.4.0       get_intra_block
3390      0.5085  2         0.0092  libxvidcore.so.4.0       get_dc_size_chrom
3242      0.4863  228       1.0522  libxvidcore.so.4.0       interpolate8x8_halfpel_v_3dne.rounding1
3151      0.4726  53        0.2446  libxvidcore.so.4.0       get_mcbpc_inter
3019      0.4528  234       1.0799  libxvidcore.so.4.0       get_cbpy
3004      0.4506  50        0.2307  libxvidcore.so.4.0       transfer_16to8add_3dne
2544      0.3816  23        0.1061  libxvidcore.so.4.0       get_dc_dif
2250      0.3375  65        0.3000  libxvidcore.so.4.0       anonymous symbol from section .plt
2033      0.3049  108       0.4984  libxvidcore.so.4.0       interpolate8x8_avg2_mmx.rounding1
2008      0.3012  119       0.5492  libxvidcore.so.4.0       interpolate8x8_avg2_mmx
1936      0.2904  3         0.0138  libxvidcore.so.4.0       BitstreamReadHeaders
443       0.0664  0        0.0e+00  libxvidcore.so.4.0       image_output
352       0.0528  2         0.0092  libxvidcore.so.4.0       decoder_output
261       0.0391  1         0.0046  libxvidcore.so.4.0       xvid_decore
151       0.0226  2         0.0092  libxvidcore.so.4.0       emms_3dn
148       0.0222  2         0.0092  libxvidcore.so.4.0       get_mcbpc_intra
61        0.0091  0        0.0e+00  libxvidcore.so.4.0       image_swap
-------------- next part --------------
* looking for ed.gomez at free.fr--2004-1/xvidcore--head--0.0--patch-26 to compare with
* comparing to ed.gomez at free.fr--2004-1/xvidcore--head--0.0--patch-26
M  src/decoder.c

* modified files

--- orig/src/decoder.c
+++ mod/src/decoder.c
@@ -1215,7 +1215,7 @@
 	uint32_t x, y;
 	VECTOR mv;
 	const VECTOR zeromv = {0,0};
-	const int64_t TRB = dec->time_pp - dec->time_bp, TRD = dec->time_pp;
+	const int32_t TRB = dec->time_pp - dec->time_bp, TRD = dec->time_pp;
 	int i;
 
 	start_timer();



-------------- next part --------------
* looking for ed.gomez at free.fr--2004-1/xvidcore--head--0.0--patch-26 to compare with
* comparing to ed.gomez at free.fr--2004-1/xvidcore--head--0.0--patch-26
M  src/decoder.c
M  src/decoder.h

* modified files

--- orig/src/decoder.c
+++ mod/src/decoder.c
@@ -815,10 +815,13 @@
 		mb_height = (dec->height + 31) / 32;
 	}
 
-	start_timer();
-	image_setedges(&dec->refn[0], dec->edged_width, dec->edged_height,
-					dec->width, dec->height, dec->bs_version);
-	stop_edges_timer();
+	if (!dec->is_edged[0]) {
+		start_timer();
+		image_setedges(&dec->refn[0], dec->edged_width, dec->edged_height,
+						dec->width, dec->height, dec->bs_version);
+		dec->is_edged[0] = 1;
+		stop_edges_timer();	
+	}
 
 	if (gmc_warp) {
 		/* accuracy: 0==1/2, 1=1/4, 2=1/8, 3=1/16 */
@@ -1218,12 +1221,21 @@
 	const int64_t TRB = dec->time_pp - dec->time_bp, TRD = dec->time_pp;
 	int i;
 
-	start_timer();
-	image_setedges(&dec->refn[0], dec->edged_width, dec->edged_height,
-					dec->width, dec->height, dec->bs_version);
-	image_setedges(&dec->refn[1], dec->edged_width, dec->edged_height,
-					dec->width, dec->height, dec->bs_version);
-	stop_edges_timer();
+	if (dec->is_edged[0]) {
+		start_timer();
+		image_setedges(&dec->refn[0], dec->edged_width, dec->edged_height,
+						dec->width, dec->height, dec->bs_version);
+		dec->is_edged[0] = 1;
+		stop_edges_timer();
+	}
+
+	if (dec->is_edged[1]) {
+		start_timer();
+		image_setedges(&dec->refn[1], dec->edged_width, dec->edged_height,
+						dec->width, dec->height, dec->bs_version);
+		dec->is_edged[1] = 1;
+		stop_edges_timer();
+	}
 
 	for (y = 0; y < dec->mb_height; y++) {
 		/* Initialize Pred Motion Vector */
@@ -1547,7 +1559,9 @@
 		}
 
 		image_swap(&dec->refn[0], &dec->refn[1]);
+		dec->is_edged[0] = dec->is_edged[1];
 		image_swap(&dec->cur, &dec->refn[0]);
+		dec->is_edged[1] = 0;
 		SWAP(MACROBLOCK *, dec->mbs, dec->last_mbs);
 		dec->last_reduced_resolution = reduced_resolution;
 		dec->last_coding_type = coding_type;


--- orig/src/decoder.h
+++ mod/src/decoder.h
@@ -159,6 +159,9 @@
 	xvid_image_t* out_frm;                /* This is used for slice rendering */
 
 	int * qscale;				/* quantization table for decoder's stats */
+
+	/* Tells if the reference image is edged or not */
+	int is_edged[2];
 }
 DECODER;
 





More information about the XviD-devel mailing list