[XviD-devel] [PATCH] calc_cbp_sse2 optimization
Andrew Voznytsa
av at polynet.lviv.ua
Sun Apr 18 23:55:03 CEST 2004
Edouard Gomez wrote:
>With windows:
> - with an Intel... hmmm dunno, you can use xvid_bench, but its timing
> function isn't very precise because it's based on ms (time duration
> not MS(tm)) precision. Maybe you can give a try at better high
> precision timers available in Win32 APIs.
>
>
Intel VTune (available for Windows and Linux) and nothing else (believe
me, 3 years experience). There is only one disadvantage: VTune is
expensive, about US$ 500-800. From other side, free trial (fully
functional) version avaialable.
As an alternative you can use rdtsc instruction. But keep in mind that
if you want to measure execution time of small piece of code then you'll
get wrong result because P4 may/will execute rdtsc out of order. For example
'piece of some code'
rdtsc
'a few instuctions to measure'
rdtsc
P4 will reorder instuctions and they may look like:
'piece of some code'
'a few instuctions to measure'
rdtsc
rdtsc
or
rdtsc
'piece of some code'
'a few instuctions to measure'
rdtsc
to avoid such situation before rdtsc it is necessary to insert some
instruction which is always executed in-order.
for example:
'piece of some code'
xor eax, eax
cpuid
rdtsc
'a few instuctions to measure'
xor eax, eax
cpuid
rdtsc
smallest time which could be measured using such technique is 50-100 clocks.
--
Best regards,
Andrew Voznytsa
More information about the XviD-devel
mailing list