[XviD-devel] [PATCH] calc_cbp_sse2 optimization

Andrew Voznytsa av at polynet.lviv.ua
Sun Apr 18 23:55:03 CEST 2004


Edouard Gomez wrote:

>With windows:
> - with an Intel... hmmm dunno, you can use xvid_bench, but its timing
>   function isn't very precise because it's based on ms (time duration
>   not  MS(tm)) precision.  Maybe  you can  give  a try  at better  high
>   precision timers available in Win32 APIs.
>  
>
Intel VTune (available for Windows and Linux) and nothing else (believe 
me, 3 years experience). There is only one disadvantage: VTune is 
expensive, about US$ 500-800. From other side, free trial (fully 
functional) version avaialable.
As an alternative you can use rdtsc instruction. But keep in mind that 
if you want to measure execution time of small piece of code then you'll 
get wrong result because P4 may/will execute rdtsc out of order. For example

'piece of some code'
rdtsc
'a few instuctions to measure'
rdtsc

P4 will reorder instuctions and they may look like:

'piece of some code'
'a few instuctions to measure'
rdtsc
rdtsc

or

rdtsc
'piece of some code'
'a few instuctions to measure'
rdtsc

to avoid such situation before rdtsc it is necessary to insert some 
instruction which is always executed in-order.
for example:

'piece of some code'

xor eax, eax
cpuid
rdtsc

'a few instuctions to measure'

xor eax, eax
cpuid
rdtsc

smallest time which could be measured using such technique is 50-100 clocks.

-- 
Best regards,
Andrew Voznytsa



More information about the XviD-devel mailing list