[XviD-devel] 15% faster search16, if anyone is interested

Thu, 11 Jul 2002 19:14:30 +0200 (CEST)

On Fri, 12 Jul 2002, Radoslaw 'sysKin' Czyz wrote:

> Hi
> As some of you know, I'm trying to create a different inter4v motion
> search.
> As a by-product of my experiments, I've discovered that four sad8s
> conducted in place of one sad16 lead to about 15% faster encoding (for
> non-inter4v). Of course this is not because 4 x sad8 is faster, but
> because it's possible to do an early termination after each sad8.

How exactly does the change look like?

I just tried this: 
---------------------------------------
int32_t
sad8x4_c(const uint8_t * const cur,
                  const uint8_t * const ref,
                  const uint32_t stride,
                  const uint32_t best_sad)
{
        int sad = sad8(cur,ref,stride);
        if (sad<best_sad)
                sad += sad8(cur+8,ref+8,stride);
        if (sad<best_sad)
                sad += sad8(cur+8*stride,ref+8*stride,stride);
        if (sad<best_sad)
                sad += sad8(cur+8*stride+8,ref+8*stride+8,stride);

        return sad;

}

(where sad8 = sad8_xmm) and it was 4-5% slower than sad16_xmm.

> I was a bit surprised about this, but whatever, It doesn't matter much
> for my code - it will only benefit if inter4v is off.
> 
> The results were made with pmvfast, no-inter4v, advdiamond, halfpel
> refine and ext_search. It used xmm sad-code (for AthlonXP.. it's xmm
> right? It doesn't matter either)

Christoph

-- 
Christoph H. Lampert chl@math.uni-bonn.de | Diese Signature wurde maschi-     
Beringstr. 6, Raum 14 Tel. (0228) 73-2948 | nell erstellt und bedarf
Sprechstunden: keine, aber meistens da    | keiner Unterschrift. AZ 27B-6