performance measurement and caches

boehm@parc.xerox.com (Hans Boehm)
14 Feb 1996 21:34:00 -0500

          From comp.compilers

Related articles
performance measurement and caches boehm@parc.xerox.com (1996-02-14)
Re: performance measurement and caches Terje.Mathisen@hda.hydro.com (Terje Mathisen) (1996-02-16)
Re: performance measurement and caches chase@centerline.com (1996-02-16)
Re: performance measurement and caches romer@cs.washington.edu (1996-02-16)
Re: performance measurement and caches jgj@ssd.hcsc.com (1996-02-16)
Re: performance measurement and caches alms@pesqueira.di.ufpe.br (1996-02-16)
Re: performance measurement and caches mff@research.att.com (Mary Fernandez) (1996-02-16)
[4 later articles]
| List of all articles for this month |

From: boehm@parc.xerox.com (Hans Boehm)
Newsgroups: comp.compilers,comp.arch
Date: 14 Feb 1996 21:34:00 -0500
Organization: Xerox Palo Alto Research Center
Summary: Is it still possible to get useful measurements?
Keywords: storage, hardware, performance

An anecdote from a few days ago:


I'm trying to obtain performance measurements of some small to medium
sized applications for a conference paper. Being somewhat experienced
at such things, I cautiously start with a smallish (2000 line or so)
test. I repeatedly time two exeutables, getting approximately
repeatable times for each:


tweety% time ./slowtest
SUCCEEDED


real 10.5
user 10.2
sys 0.2
tweety% time ./fasttest
SUCCEEDED


real 5.5
user 5.2
sys 0.2


So what's wrong? Slowtest and fasttest were bitwise identical executables!


Based on conversations with Marvin Theimer and Tim Diebert, we even
managed to come up with a likely explanation. (Corrections
appreciated.) The machine in question was a SPARCstation 10, with an
L2 cache. Apparently the L2 cache is physically indexed, and direct
mapped. The executable files were read once and loaded into the file
system cache. Thus their location in physical memory did not change
across runs. (The performance did change when I repeated the test a
day later.) Apparently slowtest was loaded in an unfortunate
position, so that two very frequently accessed code sections collided
in the cache. The only way to remedy the situation was to flush the
file system cache.


Since I'm trying to measure performance differences much less than a
factor of 2, where does this leave me? Is there any reason to believe
any of the performance measurements that have appeared in research
papers, or benchmarks published in magazines? How does someone trying
to perfomance tune a product determine whether they're making
progress?


Note that this problem is not limited to SPARCstation 10s. As far as
I know, nearly all modern workstations are subject to some such
effects. (PCs running MS OSes seem a bit better for completely the
wrong reason. As far as I can tell, they generally don't cache files
read over the net. The executables in this case were not local, so
that would have saved me. Typical 486 and Pentium L2 caches seem just
as bad. I may find an SS10 or 20 without the L2 cache, but those seem
to have become rare.)


The problem isn't even limited to the cache. Many machines have
primary memory with nonuniform access times. (I started out on a
SPARC 2 with SBUS memory.) PCs with ISA memory are particularly bad.
But some PC mother boards also interleave memory when possible, but
not for the leftover SIMMs at the end.


Do we need the "shuffle physical memory" OS call?


Hans-J. Boehm
(boehm@parc.xerox.com)
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.