[XviD-devel] Re: mrSAD
peter ross
xvid-devel@xvid.org
Mon, 15 Jul 2002 12:29:53 +1000
>From: Christoph Lampert <chl@math.uni-bonn.de>
>Reply-To: xvid-devel@xvid.org
>To: xvid-devel@xvid.org
>Subject: Re: [XviD-devel] Re: mrSAD
>Date: Mon, 8 Jul 2002 09:58:16 +0200 (CEST)
>
>Also, for syskin's ME and my rewriting of SEARCH8() which is much too
>slow:
>Would it be possible to have a vector valued SAD16, also return the
>four SAD8-values.
>
>functionality should be
>
>int sad16v_c(ptr,ptr, int* sad8)
>{
> sad8[0] = SAD8(topleftblock);
> sad8[1] = SAD8(toprightblock):
> sad8[2] = SAD8(bottomleftblock);
> sad8[3] = SAD8(bottomrightblock);
>
> return (sad8[0]+sad8[1]+sad8[2]+sad8[3]);
>}
>
>However, I guess it would be better to extract the SAD8's during
>intermediate steps of SAD16 than calling SAD8 four times.
>
done. simply paste into sad_mmx.asm. it should be ~10-20% faster than the
plainc version (assuming SAD8=sad8_xmm). writing a sad16v_mmx is a little
harder because of register limitations. please test it against the plainc
version, eg. using bitstream comparision.
michael, i reckon the next homepage poll should be titled:
"what's inside your box?"
-Pentium/Pro/II
-PentiumIII
-PentiumIV
-Itanium/2
-AMD K6-2/3
-Duron/Athlon
-AthlonXP/MP
-Macintosh+Altivec
-Other...
; ---
cglobal sad16v_xmm
;===========================================================================
;int sad16v_xmm(const uint8_t * const cur,
; const uint8_t * const ref,
; const uint32_t stride,
; int* sad8);
;===========================================================================
align 16
sad16v_xmm:
push ebx
mov eax, [esp+4+ 4] ; Src1
mov edx, [esp+4+ 8] ; Src2
mov ecx, [esp+4+12] ; Stride
mov ebx, [esp+4+16] ; sad ptr
pxor mm5, mm5 ; accum1
pxor mm6, mm6 ; accum2
pxor mm7, mm7 ; total
SAD_16x16_SSE
SAD_16x16_SSE
SAD_16x16_SSE
SAD_16x16_SSE
SAD_16x16_SSE
SAD_16x16_SSE
SAD_16x16_SSE
SAD_16x16_SSE
paddusw mm7, mm5
paddusw mm7, mm6
movd [ebx], mm5
movd [ebx+4], mm6
pxor mm5, mm5 ; accum1
pxor mm6, mm6 ; accum2
SAD_16x16_SSE
SAD_16x16_SSE
SAD_16x16_SSE
SAD_16x16_SSE
SAD_16x16_SSE
SAD_16x16_SSE
SAD_16x16_SSE
SAD_16x16_SSE
paddusw mm7, mm5
paddusw mm7, mm6
movd [ebx+8], mm5
movd [ebx+12], mm6
movd eax, mm7
pop ebx
ret
;--------
-- pete
_________________________________________________________________
Send and receive Hotmail on your mobile device: http://mobile.msn.com