From: Rob Hooft (hooft@fys.ruu.nl)
Date: 08/20/92


From: hooft@fys.ruu.nl (Rob Hooft)
Subject: Re: Jumptable Performance (Was: Re: shared libs - can everyone be happy with this?)
Date: Thu, 20 Aug 1992 10:46:35 GMT

In <1992Aug19.125541.865@crd.ge.com> davidsen@ariel.crd.GE.COM (william E Davidsen) writes:

>In article <1992Aug18.154149.26416@fys.ruu.nl>, hooft@fys.ruu.nl (Rob Hooft) writes:

>| Yes, that is what I expect too, but I didn't expect the usertime to go
>| down at all, certainly not by over 0.5 seconds. We're talking a
>| program here that runs for 25 hard CPU-seconds! I'll be timing again
>| this evening, and might even retry the BYTE-bench this time. Twice,
>| that is. I guess I'll be using jump-libs for the rest of my linux-life....

> Like most things which seem too good to be true, I'm suspicious. Does
>anyone have an explanation why adding size and instructions to every
>library call would make the program use less user CPU (or appear to)?
>Having used jump tables before I have to be suspicious.

I did not do the BYTE bench (couldn't find the source at home, have to
re-download it), but I retried three different programs with a number
of compilation and linking options. Measurements were done using the
'set time=1' option in tcsh. All measurements have been carried out 5
times. Not all results are reproduced here: that would create a too
long posting.

My preliminary conclusion is that the differences in timing for the
fp-benchmarks are probably caused by accidental alignment differences,
and that the real differences caused by the jump are too small to be
measured. Especially the 'ls' benchmark, which should do about 1E5
lib-calls, is convincing to me.

I would really like to see any feedback: please let me know what you
think.

======================================================================
flops.c: (fp-benchmark)
   gcc -O6 -funroll-all-loops -fomit-frame-pointer flops.c -s -o flops

-jump -N : 1.8081 Mflops, 14.86u 0.06s (14.94 clocktime)
              1.8144 Mflops, 14.86u 0.04s (14.90 clocktime)
              1.8144 Mflops, 14.85u 0.05s (14.90 clocktime)
              1.8207 Mflops, 14.86u 0.04s (14.90 clocktime)
              1.8137 Mflops, 14.87u 0.03s (14.93 clocktime)

-N : 1.7763 Mflops, 15.35u 0.03s (15.39 clocktime)

-static -N : 1.8020 Mflops, 15.07u 0.02s (15.10 clocktime)

-static : 1.7880 Mflops, 15.13u 0.02s (15.16 clocktime)

-m486 -jump: 1.8175 Mflops, 14.86u 0.04s (14.92 clocktime)

============================================================
lorenz.c: (fp-benchmark)
   gcc -N -O6 -funroll-all-loops -fomit-frame-pointer -s lorenz.c -o lorenz

-jump : 60.31u 0.03s 1:00.40
-m486 -jump : 60.09u 0.03s 1:00.18
-m486 : 60.40u 0.03s 1:00.50
-m486 -static : 60.10u 0.02s 1:00.18
============================================================
GNU ls: (does a lot of library calls!)
  gcc -N ls.a -o ls
  ./ls -lR / > /dev/null
(that was 5707 files on 3 64Mb minix partitions)

-jump : 4.17u 5.64s 9.91
(shared) : 4.12u 5.57s 9.90
-static : 4.09u 5.66s 9.89

(Considering the "large" differences between test results, these
 three programs have "exactly" the same performance).
============================================================

-- 
Rob Hooft, Bijvoet Center for Biomolecular Research, 
Chemistry department University of Utrecht, the Netherlands
hooft@hutruu54.bitnet hooft@chem.ruu.nl hooft@fys.ruu.nl hooft@cc.ruu.nl