Re: performance measurement and caches

alms@pesqueira.di.ufpe.br (Andre Santos)
16 Feb 1996 23:40:35 -0500

          From comp.compilers

Related articles
performance measurement and caches boehm@parc.xerox.com (1996-02-14)
Re: performance measurement and caches Terje.Mathisen@hda.hydro.com (Terje Mathisen) (1996-02-16)
Re: performance measurement and caches chase@centerline.com (1996-02-16)
Re: performance measurement and caches romer@cs.washington.edu (1996-02-16)
Re: performance measurement and caches jgj@ssd.hcsc.com (1996-02-16)
Re: performance measurement and caches alms@pesqueira.di.ufpe.br (1996-02-16)
Re: performance measurement and caches mff@research.att.com (Mary Fernandez) (1996-02-16)
Re: performance measurement and caches grunwald@foobar.cs.colorado.edu (1996-02-17)
Re: performance measurement and caches Terje.Mathisen@hda.hydro.com (Terje Mathisen) (1996-02-18)
Re: performance measurement and caches cdg@nullstone.com (1996-02-19)
Re: performance measurement and caches mschmit@ix.netcom.com (1996-02-21)
| List of all articles for this month |

From: alms@pesqueira.di.ufpe.br (Andre Santos)
Newsgroups: comp.compilers,comp.arch
Date: 16 Feb 1996 23:40:35 -0500
Organization: Departamento de Informatica - UFPE
References: 96-02-165
Keywords: storage, hardware, performance

boehm@parc.xerox.com (Hans Boehm) writes:
>[Execution time of a program can vary by a factor of two depending on
>cache effects due to where in memory a program is loaded]


Indeed this is a terrible problem for anyone performing performance
mesurements. The cache behaviour plays a big role in the timings, and
it is affected by far too many reasons (machine load, locality of
other running processes etc.). The only solutions I know of are the
two obvious ones:


- Repeat mesurements lots of times and average them, which is
    sometimes impractical depending on how long experiments take
    and how many do you have to do.
- Use a simulator to count clock cycles (?).
    I don't think it is easy to get hold of them,
    and I think they tend to be very slow too.


For experiments in my PhD thesis I had to compare the performance of
about 50 functional programs (some of them quite big) and to compile
them in dozens of different ways (different sets of optimisations).
This in practice made using the two options above a nightmare. I
ended up using a Shade, a sparc instruction set simulator from Sun
that actually counts instructions (among other things) (not cycles),
and has a quite small overhead (2 to 6 times slower than original
execution time). This way, in one go, you have a repeatable result.


For my particular needs I checked (for a subset of my programs) that
the effects on instructions executed were sufficiently related to the
effects on execution time of the programs (like you, I was also
measuring small improvements) and used instruction counts for
comparing performance. The simulator also gives counts for each
instruction or group of instructions (e.g. loads, stores, etc.) So one
can take that into account when comparing programs. Check
http://www.cs.washington.edu/research/compiler/papers.d/shade.html It
is free, but you have to send a form to Sun to get it.


BTW, an interesting paper on cache effects in performance measurements
(in my area, functional programming) is in
ftp://ftp.dcs.gla.ac.uk/pub/glasgow-fp/tech_reports/FP-94-??_spiking-caches.ps.Z
It has some details on sparc caches.


Andre.
--
Andre Santos Departamento de Informatica
e-mail: alms@di.ufpe.br Universidade Federal de Pernambuco
http://www.di.ufpe.br/~alms CP 7851, CEP 50732-970, Recife PE Brazil
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.