Re: performance measurement and caches

chase@centerline.com (David Chase)
16 Feb 1996 01:27:43 -0500

          From comp.compilers

Related articles
performance measurement and caches boehm@parc.xerox.com (1996-02-14)
Re: performance measurement and caches Terje.Mathisen@hda.hydro.com (Terje Mathisen) (1996-02-16)
Re: performance measurement and caches chase@centerline.com (1996-02-16)
Re: performance measurement and caches romer@cs.washington.edu (1996-02-16)
Re: performance measurement and caches jgj@ssd.hcsc.com (1996-02-16)
Re: performance measurement and caches alms@pesqueira.di.ufpe.br (1996-02-16)
Re: performance measurement and caches mff@research.att.com (Mary Fernandez) (1996-02-16)
Re: performance measurement and caches grunwald@foobar.cs.colorado.edu (1996-02-17)
Re: performance measurement and caches Terje.Mathisen@hda.hydro.com (Terje Mathisen) (1996-02-18)
[2 later articles]
| List of all articles for this month |

From: chase@centerline.com (David Chase)
Newsgroups: comp.compilers
Date: 16 Feb 1996 01:27:43 -0500
Organization: CenterLine Software
References: 96-02-165
Keywords: testing, performance

In article 165@comp.compilers, boehm@parc.xerox.com (Hans Boehm) writes:
> [slowtest took twice as long as fasttest]
> So what's wrong? Slowtest and fasttest were bitwise identical executables!


> ... likely explanation. (Corrections
> appreciated.) The machine in question was a SPARCstation 10, with an
> L2 cache. Apparently the L2 cache is physically indexed, and direct
> mapped. The executable files were read once and loaded into the file
> system cache. Thus their location in physical memory did not change
> across runs.


This is roughly correct, except that you ought to have been able to
flush them out of the FS cache by doing lots of other file operations
( "find / -exec cp '{}' /dev/null \;" is one such example). In
addition, collisions in the external cache induce collisions in the (4
or 5-way associative) on-chip D and I caches, reducing the benefits of
such set associativity (*). And, on the SS10, a load that misses both
the on- and off-chip caches is an expensive load indeed.


(*) An induced collision follows from "contents of external cache
include contents of internal cache" -- if you have that rule, and you
have a collision in the external cache, then it "induces" a collision
in the on-chip cache, even though the on-chip cache is
set-associative.


> Since I'm trying to measure performance differences much less than a
> factor of 2, where does this leave me? ... How does someone trying
> to perfomance tune a product determine whether they're making
> progress?


Your answer below is the correct one. When I worked at Sun, for
performance evaluation (not benchmark generation, but deciding if the
compilers were actually better or not) we used an SS10 with no
external cache. And, I think we still took the best-of-three runs.


It's also somewhat OS dependent, because different versions of the OS
use different page mapping policies (and I think they improved over
time).


> ... I may find an SS10 or 20 without the L2 cache, but those seem
> to have become rare.


I think there's some way to turn off the external cache without
physically removing it from the machine. This may require a reboot,
of course, but such is life.


speaking for myself,


David Chase
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.