[XviD-devel] Re: mrSAD

peter ross xvid-devel@xvid.org
Mon, 15 Jul 2002 12:29:53 +1000


>From: Christoph Lampert <chl@math.uni-bonn.de>
>Reply-To: xvid-devel@xvid.org
>To: xvid-devel@xvid.org
>Subject: Re: [XviD-devel] Re: mrSAD
>Date: Mon, 8 Jul 2002 09:58:16 +0200 (CEST)
>
>Also, for syskin's ME and my rewriting of SEARCH8() which is much too
>slow:
>Would it be possible to have a vector valued SAD16, also return the
>four SAD8-values.
>
>functionality should be
>
>int sad16v_c(ptr,ptr, int* sad8)
>{
>   sad8[0] = SAD8(topleftblock);
>   sad8[1] = SAD8(toprightblock):
>   sad8[2] = SAD8(bottomleftblock);
>   sad8[3] = SAD8(bottomrightblock);
>
>   return (sad8[0]+sad8[1]+sad8[2]+sad8[3]);
>}
>
>However, I guess it would be better to extract the SAD8's during
>intermediate steps of SAD16 than calling SAD8 four times.
>

done. simply paste into sad_mmx.asm. it should be ~10-20% faster than the 
plainc version (assuming SAD8=sad8_xmm). writing a sad16v_mmx is a little 
harder because of register limitations.  please test it against the plainc 
version, eg. using bitstream comparision.

michael, i reckon the next homepage poll should be titled:

"what's inside your box?"
-Pentium/Pro/II
-PentiumIII
-PentiumIV
-Itanium/2
-AMD K6-2/3
-Duron/Athlon
-AthlonXP/MP
-Macintosh+Altivec
-Other...

; ---

cglobal sad16v_xmm

;===========================================================================
;int sad16v_xmm(const uint8_t * const cur,
;	        const uint8_t * const ref,
;	        const uint32_t stride,
;	        int* sad8);
;===========================================================================
align 16
sad16v_xmm:
    push ebx
    mov eax, [esp+4+ 4] ; Src1
    mov edx, [esp+4+ 8] ; Src2
    mov ecx, [esp+4+12] ; Stride
    mov ebx, [esp+4+16] ; sad ptr

    pxor mm5, mm5 ; accum1
    pxor mm6, mm6 ; accum2
    pxor mm7, mm7 ; total
    SAD_16x16_SSE
    SAD_16x16_SSE
    SAD_16x16_SSE
    SAD_16x16_SSE
    SAD_16x16_SSE
    SAD_16x16_SSE
    SAD_16x16_SSE
    SAD_16x16_SSE
    paddusw mm7, mm5
    paddusw mm7, mm6
    movd [ebx], mm5
    movd [ebx+4], mm6

    pxor mm5, mm5 ; accum1
    pxor mm6, mm6 ; accum2
    SAD_16x16_SSE
    SAD_16x16_SSE
    SAD_16x16_SSE
    SAD_16x16_SSE
    SAD_16x16_SSE
    SAD_16x16_SSE
    SAD_16x16_SSE
    SAD_16x16_SSE
    paddusw mm7, mm5
    paddusw mm7, mm6
    movd [ebx+8], mm5
    movd [ebx+12], mm6

    movd eax, mm7
    pop ebx
    ret
;--------



-- pete


_________________________________________________________________
Send and receive Hotmail on your mobile device: http://mobile.msn.com