[XviD-devel] Proposed patch

Jim Hauxwell james at dattrax.co.uk
Sun Nov 11 14:02:04 CET 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You could easily use conditional moves here and have no lookup and no
branches.

Its a compiler feature, but a lot of cores have some form of predication
now.

Jim

>   Hi everybody,
> 
> Quoting "carlo.bramix" <carlo.bramix at libero.it>:
>> Hello,
>> I must say that I didn't a speed test on the saturation of the quantizer with
>> LUT.
>> Actually, it looked nicer since previous code had to do an addition, two
>> comparisons and two conditional jumps on a plain x86 (best case), while this
>> one do everything, addition & clipping, with a single mov opcode.
>> It could be better an "hybrid" fix with an #ifdef on DQUANT_SAT macro
>> definition and preserve current solution too:
>  I must concur that a systematic table-access for 0.00001% of potentially
>  problematic cases sounds like overkill to me, as opposed to a simple
>  well-predicted test. What about using something like:
>   if (31U > (uint32)(quant-1)) {}
>   else { .. do something rare ... }
>  instead?
> 
>   The rest looks very ok to me. Someone, care to commit?
> 
> skal
> 
> 
>> #if 1 // to decide how to do so...
>> static int __inline DQUANT_SAT(int quant)
>> {
>>     if (quant > 31)
>>         quant = 31;
>>     else
>>     if (quant < 1)
>>         quant = 1;
>>     return quant;
>> }
>> #else
>> static const int32_t dquant_table[4] = {
>>   -1, -2, 1, 2
>> };
>> #define DQUANT_SAT(_quant)  dquant_saturate[(_quant) + 1]
>> static const unsigned char dquant_saturate[] = {
>>   1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
>>   13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
>>   23, 24, 25, 26, 27, 28, 29, 30, 31, 31, 31,
>> };
>> #endif
>>
>> because I admit that it also depends on the hardware platform: ARM and
>> PowerPC versions will work better with previous code instead, otherwise they
>> need to load address of dquant_saturate[] vector too.
>> Or, more easily, you can just throw that idea of clipping...
>> Even if there could be an hypothetical benefit, I don't think it will change
>> the life of the users!
>>
>> Sincerely,
>>
>> Carlo Bramini.
>>
>> ---------- Initial Header -----------
>>
>> From      : xvid-devel-bounces at xvid.org
>> To          : xvid-devel at xvid.org
>> Cc          :
>> Date      : Tue, 06 Nov 2007 02:31:57 +1030
>> Subject : Re: [XviD-devel] Proposed patch
>>
>>> Hi Carlo
>>>
>>> Is the quantizer LUT really any good? Considering that the branches are
>>>   100% predictable, I'd guess extra memory access is actually slowing it
>>> down, as well as wasting memory. Unless you can show otherwise.
>>>
>>> The rest looks very good though.
>>>
>>> Radek
>>>
>>> carlo.bramix wrote:
>>>> Hello,
>>>> I would like to suggest the following patch.
>>>> It doesn't apply "dangerous" changes, it should just simplify/improve
>> some bits of code.
>>>> Sincerely,
>>>>
>>>> Carlo Bramini
>>>>
>>>>
>>>>
>>>> Index: src/decoder.c
>>>> ===================================================================
>>>> RCS file: /xvid/xvidcore/src/decoder.c,v
>>>> retrieving revision 1.80
>>>> diff -u -r1.80 decoder.c
>>>> --- src/decoder.c	16 Apr 2007 19:01:28 -0000	1.80
>>>> +++ src/decoder.c	5 Nov 2007 12:44:48 -0000
>>>> @@ -234,6 +234,14 @@
>>>>    -1, -2, 1, 2
>>>>  };
>>>>
>>>> +#define DQUANT_SAT(_quant)  dquant_saturate[(_quant) + 1]
>>>> +
>>>> +static const unsigned char dquant_saturate[] = {
>>>> +  1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
>>>> +  13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
>>>> +  23, 24, 25, 26, 27, 28, 29, 30, 31, 31, 31,
>>>> +};
>>>> +
>>>>  /* decode an intra macroblock */
>>>>  static void
>>>>  decoder_mbintra(DECODER * dec,
>>>> @@ -765,11 +773,7 @@
>>>>
>>>>        if (mb->mode == MODE_INTRA_Q) {
>>>>          quant += dquant_table[BitstreamGetBits(bs, 2)];
>>>> -        if (quant > 31) {
>>>> -          quant = 31;
>>>> -        } else if (quant < 1) {
>>>> -          quant = 1;
>>>> -        }
>>>> +        quant = DQUANT_SAT(quant);
>>>>        }
>>>>        mb->quant = quant;
>>>>        mb->mvs[0].x = mb->mvs[0].y =
>>>> @@ -1009,12 +1013,7 @@
>>>>          if (mb->mode == MODE_INTER_Q || mb->mode == MODE_INTRA_Q) {
>>>>            int dquant = dquant_table[BitstreamGetBits(bs, 2)];
>>>>            DPRINTF(XVID_DEBUG_MB, "dquant %i\n", dquant);
>>>> -          quant += dquant;
>>>> -          if (quant > 31) {
>>>> -            quant = 31;
>>>> -          } else if (quant < 1) {
>>>> -            quant = 1;
>>>> -          }
>>>> +          quant = DQUANT_SAT(quant+dquant);
>>>>            DPRINTF(XVID_DEBUG_MB, "quant %i\n", quant);
>>>>          }
>>>>          mb->quant = quant;
>>>> @@ -1420,11 +1419,7 @@
>>>>            mb->cbp = 0;
>>>>
>>>>          if (mb->mode && mb->cbp) {
>>>> -          quant += get_dbquant(bs);
>>>> -          if (quant > 31)
>>>> -            quant = 31;
>>>> -          else if (quant < 1)
>>>> -            quant = 1;
>>>> +          quant = DQUANT_SAT(quant + get_dbquant(bs));
>>>>          }
>>>>          mb->quant = quant;
>>>>
>>>> Index: src/image/x86_asm/colorspace_mmx.inc
>>>> ===================================================================
>>>> RCS file: /xvid/xvidcore/src/image/x86_asm/colorspace_mmx.inc,v
>>>> retrieving revision 1.4
>>>> diff -u -r1.4 colorspace_mmx.inc
>>>> --- src/image/x86_asm/colorspace_mmx.inc	29 Aug 2004 10:02:38 -0000	1.4
>>>> +++ src/image/x86_asm/colorspace_mmx.inc	5 Nov 2007 12:44:49 -0000
>>>> @@ -106,8 +106,7 @@
>>>>    sub ebx, edx
>>>>    mov [x_dif], ebx          ; x_dif = -BYTES*fixed_width - x_stride
>>>>
>>>> -  mov eax, ebp
>>>> -  sub eax, 1
>>>> +  lea eax, [ebp-1]
>>>>    push edx
>>>>    mul edx
>>>>    pop edx
>>>> Index: src/image/x86_asm/colorspace_yuv_mmx.asm
>>>> ===================================================================
>>>> RCS file: /xvid/xvidcore/src/image/x86_asm/colorspace_yuv_mmx.asm,v
>>>> retrieving revision 1.6
>>>> diff -u -r1.6 colorspace_yuv_mmx.asm
>>>> --- src/image/x86_asm/colorspace_yuv_mmx.asm	30 Oct 2006 10:52:00
>> -0000	1.6
>>>> +++ src/image/x86_asm/colorspace_yuv_mmx.asm	5 Nov 2007 12:44:51 -0000
>>>> @@ -47,6 +47,14 @@
>>>>  ; Helper macros
>>>>
>> ;=============================================================================
>>>> +%macro _MOVQ        3
>>>> +%if %1 == 1
>>>> +    movntq  %2, %3      ; xmm
>>>> +%else
>>>> +    movq    %2, %3      ; plain mmx
>>>> +%endif
>>>> +%endmacro
>>>> +
>>>>
> ;------------------------------------------------------------------------------
>>>>  ; PLANE_COPY ( DST, DST_STRIDE, SRC, SRC_STRIDE, WIDTH, HEIGHT, OPT )
>>>>  ; DST		dst buffer
>>>> @@ -100,25 +108,14 @@
>>>>    movq mm7, [esi + 48]
>>>>    movq mm0, [esi + 56]
>>>>
>>>> -%if OPT == 0                ; plain mmx
>>>> -  movq [edi     ], mm1           ; write to y_out
>>>> -  movq [edi +  8], mm2
>>>> -  movq [edi + 16], mm3
>>>> -  movq [edi + 24], mm4
>>>> -  movq [edi + 32], mm5
>>>> -  movq [edi + 40], mm6
>>>> -  movq [edi + 48], mm7
>>>> -  movq [edi + 56], mm0
>>>> -%else
>>>> -  movntq [edi     ], mm1         ; write to y_out
>>>> -  movntq [edi +  8], mm2
>>>> -  movntq [edi + 16], mm3
>>>> -  movntq [edi + 24], mm4
>>>> -  movntq [edi + 32], mm5
>>>> -  movntq [edi + 40], mm6
>>>> -  movntq [edi + 48], mm7
>>>> -  movntq [edi + 56], mm0
>>>> -%endif
>>>> +  _MOVQ OPT, [edi     ], mm1           ; write to y_out
>>>> +  _MOVQ OPT, [edi +  8], mm2
>>>> +  _MOVQ OPT, [edi + 16], mm3
>>>> +  _MOVQ OPT, [edi + 24], mm4
>>>> +  _MOVQ OPT, [edi + 32], mm5
>>>> +  _MOVQ OPT, [edi + 40], mm6
>>>> +  _MOVQ OPT, [edi + 48], mm7
>>>> +  _MOVQ OPT, [edi + 56], mm0
>>>>
>>>>    add esi, 64
>>>>    add edi, 64
>>>> @@ -133,13 +130,9 @@
>>>>  %%loop16_pc:
>>>>    movq mm1, [esi]
>>>>    movq mm2, [esi + 8]
>>>> -%if OPT == 0                ; plain mmx
>>>> -  movq [edi], mm1
>>>> -  movq [edi + 8], mm2
>>>> -%else
>>>> -  movntq [edi], mm1
>>>> -  movntq [edi + 8], mm2
>>>> -%endif
>>>> +  _MOVQ OPT, [edi], mm1
>>>> +  _MOVQ OPT, [edi + 8], mm2
>>>>
>>>>    add esi, 16
>>>>    add edi, 16
>>>> @@ -195,26 +188,14 @@
>>>>    jz %%loop16_start_pf
>>>>
>>>>  %%loop64_pf:
>>>> -
>>>> -%if OPT == 0                ; plain mmx
>>>> -  movq [edi     ], mm0          ; write to y_out
>>>> -  movq [edi +  8], mm0
>>>> -  movq [edi + 16], mm0
>>>> -  movq [edi + 24], mm0
>>>> -  movq [edi + 32], mm0
>>>> -  movq [edi + 40], mm0
>>>> -  movq [edi + 48], mm0
>>>> -  movq [edi + 56], mm0
>>>> -%else
>>>> -  movntq [edi     ], mm0        ; write to y_out
>>>> -  movntq [edi +  8], mm0
>>>> -  movntq [edi + 16], mm0
>>>> -  movntq [edi + 24], mm0
>>>> -  movntq [edi + 32], mm0
>>>> -  movntq [edi + 40], mm0
>>>> -  movntq [edi + 48], mm0
>>>> -  movntq [edi + 56], mm0
>>>> -%endif
>>>> +  _MOVQ OPT, [edi     ], mm0          ; write to y_out
>>>> +  _MOVQ OPT, [edi +  8], mm0
>>>> +  _MOVQ OPT, [edi + 16], mm0
>>>> +  _MOVQ OPT, [edi + 24], mm0
>>>> +  _MOVQ OPT, [edi + 32], mm0
>>>> +  _MOVQ OPT, [edi + 40], mm0
>>>> +  _MOVQ OPT, [edi + 48], mm0
>>>> +  _MOVQ OPT, [edi + 56], mm0
>>>>
>>>>    add edi, 64
>>>>    loop %%loop64_pf
>>>> @@ -225,13 +206,8 @@
>>>>    jz %%loop1_start_pf
>>>>
>>>>  %%loop16_pf:
>>>> -%if OPT == 0                ; plain mmx
>>>> -  movq [edi    ], mm0
>>>> -  movq [edi + 8], mm0
>>>> -%else
>>>> -  movntq [edi    ], mm0
>>>> -  movntq [edi + 8], mm0
>>>> -%endif
>>>> +  _MOVQ OPT, [edi    ], mm0
>>>> +  _MOVQ OPT, [edi + 8], mm0
>>>>
>>>>    add edi, 16
>>>>    loop %%loop16_pf
>>>> Index: src/image/x86_asm/interpolate8x8_mmx.asm
>>>> ===================================================================
>>>> RCS file: /xvid/xvidcore/src/image/x86_asm/interpolate8x8_mmx.asm,v
>>>> retrieving revision 1.18
>>>> diff -u -r1.18 interpolate8x8_mmx.asm
>>>> --- src/image/x86_asm/interpolate8x8_mmx.asm	13 Sep 2005 12:12:15
>> -0000	1.18
>>>> +++ src/image/x86_asm/interpolate8x8_mmx.asm	5 Nov 2007 12:44:53 -0000
>>>> @@ -574,7 +574,6 @@
>>>>
>>>>    mov eax, [esp + 4 + 24]   ; height -> eax
>>>>    sub eax, 8
>>>> -  test eax, eax
>>>>
>>>>    mov ecx, [esp + 4 + 4]    ; dst -> edi
>>>>    mov eax, [esp + 4 + 8]    ; src1 -> esi
>>>> @@ -604,7 +603,6 @@
>>>>  .rounding1
>>>>    mov eax, [esp + 4 + 24]       ; height -> eax
>>>>    sub eax, 8
>>>> -  test eax, eax
>>>>
>>>>    mov ecx, [esp + 4 + 4]        ; dst -> edi
>>>>    mov eax, [esp + 4 + 8]        ; src1 -> esi
>>>> @@ -1024,9 +1022,7 @@
>>>>    mov eax, [esp + 4 + 8]            ; src -> esi
>>>>    mov edx, [esp + 4 + 12]           ; stride -> edx
>>>>
>>>> -  mov ebx, edx
>>>> -  shl ebx, 1
>>>> -  add ebx, edx
>>>> +  lea ebx, [edx+edx*2]
>>>>
>>>>    pxor mm7, mm7
>>>>
>>>> Index: src/image/x86_asm/postprocessing_mmx.asm
>>>> ===================================================================
>>>> RCS file: /xvid/xvidcore/src/image/x86_asm/postprocessing_mmx.asm,v
>>>> retrieving revision 1.4
>>>> diff -u -r1.4 postprocessing_mmx.asm
>>>> --- src/image/x86_asm/postprocessing_mmx.asm	29 Aug 2004 10:02:38
>> -0000	1.4
>>>> +++ src/image/x86_asm/postprocessing_mmx.asm	5 Nov 2007 12:44:53 -0000
>>>> @@ -115,7 +115,7 @@
>>>>  	jl .xloop
>>>>
>>>>  	add edx, ecx				; dst += stride
>>>> -	sub edi, 1
>>>> +	dec edi
>>>>  	jg .yloop
>>>>
>>>>  	pop edi
>>>> Index: src/image/x86_asm/postprocessing_sse2.asm
>>>> ===================================================================
>>>> RCS file: /xvid/xvidcore/src/image/x86_asm/postprocessing_sse2.asm,v
>>>> retrieving revision 1.5
>>>> diff -u -r1.5 postprocessing_sse2.asm
>>>> --- src/image/x86_asm/postprocessing_sse2.asm	29 Aug 2004 10:02:38
>> -0000	1.5
>>>> +++ src/image/x86_asm/postprocessing_sse2.asm	5 Nov 2007 12:44:55 -0000
>>>> @@ -130,7 +130,7 @@
>>>>    jl .xloop
>>>>
>>>>    add edx, ecx                  ; dst += stride
>>>> -  sub edi, 1
>>>> +  dec edi
>>>>    jg .yloop
>>>>
>>>>    add esp, 32
>>>> Index: src/image/x86_asm/reduced_mmx.asm
>>>> ===================================================================
>>>> RCS file: /xvid/xvidcore/src/image/x86_asm/reduced_mmx.asm,v
>>>> retrieving revision 1.6
>>>> diff -u -r1.6 reduced_mmx.asm
>>>> --- src/image/x86_asm/reduced_mmx.asm	29 Aug 2004 10:02:38 -0000	1.6
>>>> +++ src/image/x86_asm/reduced_mmx.asm	5 Nov 2007 12:44:57 -0000
>>>> @@ -758,7 +758,7 @@
>>>>    packuswb mm1, mm7
>>>>    movd [esi+eax*4], mm0
>>>>    movd [edi+eax*4], mm1
>>>> -  add eax,1
>>>> +  inc eax
>>>>    jl .Loop
>>>>
>>>>    pop edi
>>>>
>>>> _______________________________________________
>>>> XviD-devel mailing list
>>>> XviD-devel at xvid.org
>>>> http://list.xvid.org/mailman/listinfo/xvid-devel
>>>>
>>>>
>>> _______________________________________________
>>> XviD-devel mailing list
>>> XviD-devel at xvid.org
>>> http://list.xvid.org/mailman/listinfo/xvid-devel
>>>
>> _______________________________________________
>> XviD-devel mailing list
>> XviD-devel at xvid.org
>> http://list.xvid.org/mailman/listinfo/xvid-devel
>>
> 
> 
> _______________________________________________
> XviD-devel mailing list
> XviD-devel at xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHNv1MhrNWoHjgI1ARAgsoAKCz3tnyAmuocKhywOHi1Ea9tj4GIgCfTtPP
qCmiJMGlS5qrEMORp0GhggU=
=rMQq
-----END PGP SIGNATURE-----



More information about the XviD-devel mailing list