[XviD-devel] [PATCH] Possible patches for decoder speedup
Edouard Gomez
ed.gomez at free.fr
Sun May 16 14:08:37 CEST 2004
Hey hey,
I've been doing some profiling of the decoder, and it proves we sux at
decoding bframes, but to speed this up, it requires some hard work, so i
prefered looking at second ranked candidates for optim.
Obviously, there were two things that could save lot cycles w/o much
effort:
- edging the ref frames only when required.
- don't use 64bit arithmetic in direct coded bframe blocks
The first point is only a "never call a function twice on the same image"
optimization, it won't hurt much. But the second patch is somewhat
touching at critical code :) so i prefer asking first if there was a
really good reasons for TRD/TRB being 64bit data ?
Patches are attached, i'd like to have feedback on both.
Results on a 4min sequence:
no patches:
BENCHMARKs: VC: 35,539s VO: 0,038s A: 0,000s Sys: 2,103s = 37,681s
int32 patch alone:
BENCHMARKs: VC: 35,075s VO: 0,037s A: 0,000s Sys: 2,134s = 37,246s
Both patches applied:
BENCHMARKs: VC: 32,364s VO: 0,038s A: 0,000s Sys: 2,175s = 34,577s
To show we wtill sux at decoding :-)
Libavcodec:
BENCHMARKs: VC: 22,334s VO: 0,042s A: 0,000s Sys: 2,435s = 24,812s
--
Edouard Gomez
-------------- next part --------------
CPU: Athlon, speed 1800.92 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 3000
Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 500
samples % samples % image name symbol name
80522 12.0783 1616 7.4577 libxvidcore.so.4.0 decoder_decode
65451 9.8176 4666 21.5331 libxvidcore.so.4.0 decoder_bf_interpolate_mbinter
56616 8.4924 1409 6.5024 libxvidcore.so.4.0 get_coeff
49677 7.4515 2026 9.3498 libxvidcore.so.4.0 decoder_mbinter
43783 6.5674 138 0.6369 libxvidcore.so.4.0 idct_3dne
37315 5.5972 2347 10.8311 libxvidcore.so.4.0 interpolate8x8_halfpel_hv_3dne
31974 4.7961 274 1.2645 libxvidcore.so.4.0 predict_acdc
31381 4.7071 509 2.3490 libxvidcore.so.4.0 decoder_pframe
24449 3.6673 1125 5.1917 libxvidcore.so.4.0 interpolate8x8_switch
23504 3.5256 343 1.5829 libxvidcore.so.4.0 decoder_mbintra
19801 2.9701 635 2.9305 libxvidcore.so.4.0 transfer8x8_copy_3dne
18704 2.8056 818 3.7750 libxvidcore.so.4.0 image_setedges
15785 2.3677 959 4.4257 libxvidcore.so.4.0 interpolate8x8_halfpel_h_3dne
15338 2.3007 58 0.2677 libxvidcore.so.4.0 dequant_h263_intra_3dne
13877 2.0815 707 3.2627 libxvidcore.so.4.0 interpolate8x8_avg2_mmx.start1
13757 2.0635 488 2.2521 libxvidcore.so.4.0 interpolate8x8_halfpel_v_3dne
13054 1.9581 152 0.7015 libxvidcore.so.4.0 get_mv
9734 1.4601 642 2.9628 libxvidcore.so.4.0 interpolate8x8_halfpel_hv_3dne.rounding1
9053 1.3579 327 1.5091 libxvidcore.so.4.0 __i686.get_pc_thunk.bx
8896 1.3344 115 0.5307 libxvidcore.so.4.0 decoder_mb_decode
8597 1.2895 225 1.0383 libxvidcore.so.4.0 add_acdc
8449 1.2673 202 0.9322 libxvidcore.so.4.0 get_pmv2
7542 1.1313 120 0.5538 libxvidcore.so.4.0 dequant_h263_inter_3dne
6623 0.9934 426 1.9659 libxvidcore.so.4.0 transfer_16to8copy_3dne
5610 0.8415 22 0.1015 libxvidcore.so.4.0 get_dc_size_lum
4889 0.7333 299 1.3799 libxvidcore.so.4.0 interpolate8x8_halfpel_h_3dne.rounding1
4880 0.7320 80 0.3692 libxvidcore.so.4.0 check_resync_marker
4710 0.7065 21 0.0969 libxvidcore.so.4.0 get_inter_block
4703 0.7054 28 0.1292 libxvidcore.so.4.0 get_intra_block
3390 0.5085 2 0.0092 libxvidcore.so.4.0 get_dc_size_chrom
3242 0.4863 228 1.0522 libxvidcore.so.4.0 interpolate8x8_halfpel_v_3dne.rounding1
3151 0.4726 53 0.2446 libxvidcore.so.4.0 get_mcbpc_inter
3019 0.4528 234 1.0799 libxvidcore.so.4.0 get_cbpy
3004 0.4506 50 0.2307 libxvidcore.so.4.0 transfer_16to8add_3dne
2544 0.3816 23 0.1061 libxvidcore.so.4.0 get_dc_dif
2250 0.3375 65 0.3000 libxvidcore.so.4.0 anonymous symbol from section .plt
2033 0.3049 108 0.4984 libxvidcore.so.4.0 interpolate8x8_avg2_mmx.rounding1
2008 0.3012 119 0.5492 libxvidcore.so.4.0 interpolate8x8_avg2_mmx
1936 0.2904 3 0.0138 libxvidcore.so.4.0 BitstreamReadHeaders
443 0.0664 0 0.0e+00 libxvidcore.so.4.0 image_output
352 0.0528 2 0.0092 libxvidcore.so.4.0 decoder_output
261 0.0391 1 0.0046 libxvidcore.so.4.0 xvid_decore
151 0.0226 2 0.0092 libxvidcore.so.4.0 emms_3dn
148 0.0222 2 0.0092 libxvidcore.so.4.0 get_mcbpc_intra
61 0.0091 0 0.0e+00 libxvidcore.so.4.0 image_swap
-------------- next part --------------
* looking for ed.gomez at free.fr--2004-1/xvidcore--head--0.0--patch-26 to compare with
* comparing to ed.gomez at free.fr--2004-1/xvidcore--head--0.0--patch-26
M src/decoder.c
* modified files
--- orig/src/decoder.c
+++ mod/src/decoder.c
@@ -1215,7 +1215,7 @@
uint32_t x, y;
VECTOR mv;
const VECTOR zeromv = {0,0};
- const int64_t TRB = dec->time_pp - dec->time_bp, TRD = dec->time_pp;
+ const int32_t TRB = dec->time_pp - dec->time_bp, TRD = dec->time_pp;
int i;
start_timer();
-------------- next part --------------
* looking for ed.gomez at free.fr--2004-1/xvidcore--head--0.0--patch-26 to compare with
* comparing to ed.gomez at free.fr--2004-1/xvidcore--head--0.0--patch-26
M src/decoder.c
M src/decoder.h
* modified files
--- orig/src/decoder.c
+++ mod/src/decoder.c
@@ -815,10 +815,13 @@
mb_height = (dec->height + 31) / 32;
}
- start_timer();
- image_setedges(&dec->refn[0], dec->edged_width, dec->edged_height,
- dec->width, dec->height, dec->bs_version);
- stop_edges_timer();
+ if (!dec->is_edged[0]) {
+ start_timer();
+ image_setedges(&dec->refn[0], dec->edged_width, dec->edged_height,
+ dec->width, dec->height, dec->bs_version);
+ dec->is_edged[0] = 1;
+ stop_edges_timer();
+ }
if (gmc_warp) {
/* accuracy: 0==1/2, 1=1/4, 2=1/8, 3=1/16 */
@@ -1218,12 +1221,21 @@
const int64_t TRB = dec->time_pp - dec->time_bp, TRD = dec->time_pp;
int i;
- start_timer();
- image_setedges(&dec->refn[0], dec->edged_width, dec->edged_height,
- dec->width, dec->height, dec->bs_version);
- image_setedges(&dec->refn[1], dec->edged_width, dec->edged_height,
- dec->width, dec->height, dec->bs_version);
- stop_edges_timer();
+ if (dec->is_edged[0]) {
+ start_timer();
+ image_setedges(&dec->refn[0], dec->edged_width, dec->edged_height,
+ dec->width, dec->height, dec->bs_version);
+ dec->is_edged[0] = 1;
+ stop_edges_timer();
+ }
+
+ if (dec->is_edged[1]) {
+ start_timer();
+ image_setedges(&dec->refn[1], dec->edged_width, dec->edged_height,
+ dec->width, dec->height, dec->bs_version);
+ dec->is_edged[1] = 1;
+ stop_edges_timer();
+ }
for (y = 0; y < dec->mb_height; y++) {
/* Initialize Pred Motion Vector */
@@ -1547,7 +1559,9 @@
}
image_swap(&dec->refn[0], &dec->refn[1]);
+ dec->is_edged[0] = dec->is_edged[1];
image_swap(&dec->cur, &dec->refn[0]);
+ dec->is_edged[1] = 0;
SWAP(MACROBLOCK *, dec->mbs, dec->last_mbs);
dec->last_reduced_resolution = reduced_resolution;
dec->last_coding_type = coding_type;
--- orig/src/decoder.h
+++ mod/src/decoder.h
@@ -159,6 +159,9 @@
xvid_image_t* out_frm; /* This is used for slice rendering */
int * qscale; /* quantization table for decoder's stats */
+
+ /* Tells if the reference image is edged or not */
+ int is_edged[2];
}
DECODER;
More information about the XviD-devel
mailing list