[XviD-devel] [BUG] sad32v

syskin at ihug.com.au syskin at ihug.com.au
Fri Aug 22 09:02:22 CEST 2003


Hi again :)

I wrote:
> Uh, of course. Isn't that the case? Does it actually say +8 ?? (I can't check atm, I'm
at
> uni).

I returned home for some time and I did check. Horrible.....

I also run a quick profiling on MEAnalysis(). There is one thing that's slowing it down
but 
doesn't have to - the same applies to P-frame motion search and to all other searches too 
(but not that much).
Note that I'm compiling with M$ VS. This compiler will not use CMOV. If other compilers 
do, all I say might be bullshit ;)

The following code:
if (sad[1] < iMinSAD[1]) { currentMV[1].x = x; currentMV[1].y = y; iMinSAD[1] = sad[1] }
if (sad[2] < iMinSAD[2]) { currentMV[2].x = x; currentMV[2].y = y; iMinSAD[2] = sad[2] }
if (sad[3] < iMinSAD[3]) { currentMV[3].x = x; currentMV[3].y = y; iMinSAD[3] = sad[3] }
if (sad[4] < iMinSAD[4]) { currentMV[4].x = x; currentMV[4].y = y; iMinSAD[4] = sad[4] }

..is very slow. The branches are badly predicted - no wonder, the result is not
predicable. 
The memory movement is also slow, leading to all kinds bank conflicts you can get on 
Athlon...
Any one of the four "if"s is, on average, slower than function call overhead.

Now, my point is that this code looks pretty beautiful for all SIMD processors. I don't
know 
asm and I can't tell if MMX would help here (it does have some conditionals right?) - but 
is we could try it in mmx, we would really gain a lot (IF cmov is not used by sane 
compilers for any reason).

Now, another thing: Edouard is trying intristics for sad16v. sad16v is only used in 
CheckCandidate16() [... and in main ME loop... I just remebered]. If we can write fast 
sad16v, we can as well rewrite entire CheckCandidate16(). It's called by pointers 
anyway.

I would propose to write CheckCandidate16() in pure nasm, which would be an 
"ultimate" solution ;) but I won't - making any changes, in particular to SearchData
struct, 
would be a horrible pain. Intristics or inlined assembler would work fine, though.

(btw, I did try this trick some time ago:
int mask = (sad[1] - iMinSAD[1])>>31;
currentMV[1].x = (mask&x) | ((~mask)&currentMV[1].x); ... etc. It was much much slower. )

I am downloading Intel compiler. It supports gcc-style intristics and gcc-style inlined
asm 
(and VC style too), so I'm hoping we'll come up with something useful ;)
Can someone point me to a good resource, which would help me learn intristics? I don't 
know asm but I know what to expect of it (I used to learn m68000 assembler a long long 
time ago). Thanks.

Regards,
Radek





More information about the XviD-devel mailing list