Re[4]: [XviD-devel] Quality optimization

Klaus Post (KPO) xvid-devel@xvid.org
Fri, 24 Jan 2003 14:40:01 +0100


Hi folks - first post after lurking some months.


Regarding your results, I have some points:

- Code does NOT have to be built in debug mode - you only have to create
program database. Build in Release mode, with full optimizations.

- This is also why you are getting horrible performance on abs(). In
Release mode, a special trick is used by MSVC to get the absolute value.
This code is non-branching.

- In general however, branch prediction is something to be aware of.
It's described as "Branching on Random data" in the AMD optimization
guides, and means that you should at any time avoid doing data-based
branches in inner loops. Conditional moves (cmov) and MMX-masking is a
much better solution - but it's already used in most code.

- I haven't used NASM for assembler code, but that may be the reason why
you have problems profiling assembler code. It could however still be
able to show the appropriate values, if you place the start and stop
points within the C-code.  You should also note that MSVC might inline
some assembler, and therefore your assembler might not be triggered the
place you'd expect it to be - click the '+' sign on the (assembler)
function call and see if the code is actually placed there.

Codeanalyst is a very, very powerful tool for finding hotspots. The only
problem is that it's relatively slow, when storing data.

 
Regards, Klaus Post
AviSynth developer.
 
-----Original Message-----
From: Radek Czyz [mailto:radoslaw@syskin.cjb.net] 
Sent: 24. januar 2003 12:18
To: xvid-devel@xvid.org
Subject: Re[4]: [XviD-devel] Quality optimization

Hello,

Christoph wrote

> Yes. MMX has an instruction for that, other CPU ASMs might as well.
> If not, prefetch using C is possible, too. There are also other basic
> operations which always should be tried before immediately going to
ASM:

> http://cdrom.amd.com/devconn/events/gdc_2002_amd.pdf

Thanks for that. It looks very useful, even though I wasn't able to
get any improvement yet.

Instead, I looked a bit deeper into code optimization techniques and
tools. I downloaded AMD's CodeAnalyst and run some motion estimation
tests.
I have some questions about it.
First thing I noticed: if CodeAnalyst is correct, branch prediction is
horrible. In particular in d_mv_bits() there is:

if (x) { // 50% incorrect prediction
  x = ABS(x) // ~40% incorrect prediction
...
... and the same for y.

I have 2 questions now.
CodeAnalyst requires the library to be built in debug mode, with no
optimization. If these results are true, that means that unoptimized
library is wrong - I don't care. Does it actually mean (with
reasonable probablity) that optimized library has the same problem?

And 2. How can I improve branch prediction? In this particular example
I have a hint for the compiler: it is _not_ zero. Usually. This hint
has better chances to be true than 50%...

Do you know better tools to do this stuff? CodeAnalyst ignores asm-ed
code, doesn't allow me to run some functions several times (don't ask
for details, 300MB needed for a single call of SearchP is a hint here)
and so on...

Best Regards,
Radek

_______________________________________________
XviD-devel mailing list
XviD-devel@xvid.org
http://list.xvid.org/mailman/listinfo/xvid-devel