[XviD-devel] Inlined ASM code again

Edouard Gomez ed.gomez at free.fr
Wed Aug 20 18:02:01 CEST 2003


Michael Militzer (michael at xvid.org) wrote:
> I'd like to comment on this: Your benchmark is very artificial, so it
> doesn't say much. I'd suggest you should create a sad16 replacement using
> gcc intrinsics, patch XviD to use your newly created sad16 version, switch
> to a 16x16 block search only quality mode (<4) and compare encoding speed
> between your patch and the standard XviD version.

I know, i warned about its artificial :)

I'm working on getting sad8, sad16, sad16v inlined and replacing
pointers by defines (nasty) in XviD.

Here is an early gnu profile of relevant functions (that eavuily depend
on sad functions):
./xvid_encraw -asm -i coastguard-352x288.yuv -w 352 -h 288

With inline/intrinsics functions:
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 18.29      7.99     2.89  3705592     0.00     0.00  CheckCandidate16
  8.48      9.33     1.34  6136982     0.00     0.00  CheckCandidate8
  2.91     11.28     0.46   118008     0.00     0.05  SearchP
  1.58     13.81     0.25   641266     0.00     0.00  AdvDiamondSearch
  1.58     14.06     0.25   382432     0.00     0.01  Search8
  0.76     14.89     0.12      298     0.40    44.12  FrameCodeP
  0.32     15.59     0.05    23840     0.00     0.02  MEanalyzeMB
  0.00     15.80     0.00    23840     0.00     0.01  DiamondSearch
  0.00     15.80     0.00      298     0.00     1.60  MEanalysis
[...]

w/o inline/intrinsics:
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  6.67      8.65     0.98  3705592     0.00     0.00  CheckCandidate16
  8.71      6.44     1.28  6136982     0.00     0.00  CheckCandidate8
  1.84     12.11     0.27   118008     0.00     0.03  SearchP
  2.72     11.48     0.40   641266     0.00     0.00  AdvDiamondSearch
  1.16     12.95     0.17   382432     0.00     0.01  Search8
  1.36     12.78     0.20      298     0.67    34.98  FrameCodeP
  0.27     14.37     0.04    23840     0.00     0.00  MEanalyzeMB
  0.00     14.70     0.00    23840     0.00     0.00  DiamondSearch
  0.00     14.70     0.00      298     0.00     0.22  MEanalysis
[...]

Now  i will add  a RTC  timer for  these functions  because gprof  has a
pitiful time resolution that makes it hard to conclude on anything.  The
only thing you can notice is  that %time changes from one version to the
other, self seconds change as well, but most of the time that just shows
code has been inlined and that the function is now self contained.

PS: fps seems to be globally the same.

-- 
Edouard Gomez


More information about the XviD-devel mailing list