|performance measurement and caches firstname.lastname@example.org (1996-02-14)|
|Re: performance measurement and caches Terje.Mathisen@hda.hydro.com (Terje Mathisen) (1996-02-16)|
|Re: performance measurement and caches email@example.com (1996-02-16)|
|Re: performance measurement and caches firstname.lastname@example.org (1996-02-16)|
|Re: performance measurement and caches email@example.com (1996-02-16)|
|Re: performance measurement and caches firstname.lastname@example.org (1996-02-16)|
|Re: performance measurement and caches email@example.com (Mary Fernandez) (1996-02-16)|
|Re: performance measurement and caches firstname.lastname@example.org (1996-02-17)|
|Re: performance measurement and caches Terje.Mathisen@hda.hydro.com (Terje Mathisen) (1996-02-18)|
|Re: performance measurement and caches email@example.com (1996-02-19)|
|Re: performance measurement and caches firstname.lastname@example.org (1996-02-21)|
|From:||email@example.com (Andre Santos)|
|Date:||16 Feb 1996 23:40:35 -0500|
|Organization:||Departamento de Informatica - UFPE|
|Keywords:||storage, hardware, performance|
firstname.lastname@example.org (Hans Boehm) writes:
>[Execution time of a program can vary by a factor of two depending on
>cache effects due to where in memory a program is loaded]
Indeed this is a terrible problem for anyone performing performance
mesurements. The cache behaviour plays a big role in the timings, and
it is affected by far too many reasons (machine load, locality of
other running processes etc.). The only solutions I know of are the
two obvious ones:
- Repeat mesurements lots of times and average them, which is
sometimes impractical depending on how long experiments take
and how many do you have to do.
- Use a simulator to count clock cycles (?).
I don't think it is easy to get hold of them,
and I think they tend to be very slow too.
For experiments in my PhD thesis I had to compare the performance of
about 50 functional programs (some of them quite big) and to compile
them in dozens of different ways (different sets of optimisations).
This in practice made using the two options above a nightmare. I
ended up using a Shade, a sparc instruction set simulator from Sun
that actually counts instructions (among other things) (not cycles),
and has a quite small overhead (2 to 6 times slower than original
execution time). This way, in one go, you have a repeatable result.
For my particular needs I checked (for a subset of my programs) that
the effects on instructions executed were sufficiently related to the
effects on execution time of the programs (like you, I was also
measuring small improvements) and used instruction counts for
comparing performance. The simulator also gives counts for each
instruction or group of instructions (e.g. loads, stores, etc.) So one
can take that into account when comparing programs. Check
is free, but you have to send a form to Sun to get it.
BTW, an interesting paper on cache effects in performance measurements
(in my area, functional programming) is in
It has some details on sparc caches.
Andre Santos Departamento de Informatica
e-mail: email@example.com Universidade Federal de Pernambuco
http://www.di.ufpe.br/~alms CP 7851, CEP 50732-970, Recife PE Brazil
Return to the
Search the comp.compilers archives again.