Re: performance measurement and caches

Terje Mathisen <Terje.Mathisen@hda.hydro.com>
18 Feb 1996 13:32:47 -0500

          From comp.compilers

Related articles
[2 earlier articles]
Re: performance measurement and caches chase@centerline.com (1996-02-16)
Re: performance measurement and caches romer@cs.washington.edu (1996-02-16)
Re: performance measurement and caches jgj@ssd.hcsc.com (1996-02-16)
Re: performance measurement and caches alms@pesqueira.di.ufpe.br (1996-02-16)
Re: performance measurement and caches mff@research.att.com (Mary Fernandez) (1996-02-16)
Re: performance measurement and caches grunwald@foobar.cs.colorado.edu (1996-02-17)
Re: performance measurement and caches Terje.Mathisen@hda.hydro.com (Terje Mathisen) (1996-02-18)
Re: performance measurement and caches cdg@nullstone.com (1996-02-19)
Re: performance measurement and caches mschmit@ix.netcom.com (1996-02-21)
| List of all articles for this month |

From: Terje Mathisen <Terje.Mathisen@hda.hydro.com>
Newsgroups: comp.compilers,comp.arch
Date: 18 Feb 1996 13:32:47 -0500
Organization: Hydro
References: 96-02-165 96-02-195
Keywords: benchmarks, performance, architecture

Hans Boehm (boehm@parc.xerox.com) wrote:
> [Benchmark speed varied by a factor of two, apparently depending on
> memory location and cache behavior.]


Mary Fernandez wrote:
> I observed similar effects of procedure placement on cache
> performance on both MIPS and Intel 486. For a small set of large
> Modula-3 programs, procedure placement alone effected runtime by
> upto 15%; on the Intel up to 10%. Smaller effects than those you
> observed, but still confounding if you're trying to measure
> performance differences in that neighborhood. Just injecting NOPs
> at procedure boundaries perturbs alignment enough to produce
> measurable (>10%) differences in elapsed time.


I believe this is mostly due to the way 486, Pentium and PPro handles
code prefetching, both 486 and PPro really likes to have the top of
busy loops aligned near the beginning of a cache line, i.e. you can
get this effect in single-tasking mode, with no system traffic at all.


The Pentium relaxed the branch target requirement to 32-bit boundaries.


--
-Terje Mathisen (include std disclaimer) <Terje.Mathisen@hda.hydro.com>
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.