Re: The RISC penalty

pardo@cs.washington.edu (David Keppel)
19 Dec 1995 14:13:28 -0500

          From comp.compilers

Related articles
The RISC penalty d.sand@ix.netcom.com (1995-12-09)
Re: The RISC penalty cdg@nullstone.com (1995-12-17)
Re: The RISC penalty pardo@cs.washington.edu (1995-12-18)
Re: The RISC penalty pardo@cs.washington.edu (1995-12-19)
Re: The RISC penalty jbuck@Synopsys.COM (1995-12-20)
Re: The RISC penalty pardo@cs.washington.edu (1995-12-21)
Re: The RISC penalty iank@dircon.co.uk (1995-12-28)
Re: The RISC penalty dlmoore@ix.netcom.com (1995-12-28)
Re: The RISC penalty meissner@cygnus.com (1995-12-30)
Re: the RISC penalty john.r.strohm@BIX.com (1995-12-30)
[1 later articles]
| List of all articles for this month |

From: pardo@cs.washington.edu (David Keppel)
Newsgroups: comp.compilers
Date: 19 Dec 1995 14:13:28 -0500
Organization: Computer Science & Engineering, U. of Washington, Seattle
References: 95-12-063 95-12-077 95-12-103
Keywords: architecture, performance

pardo@cs.washington.edu writes:
>[Conclusion: code-expanding transformations are less likely to be
> successful with a RISC because RISCs are running closer to
> saturation of the memory bandwidth.]


John Levine <compilers-request@iecc.com> writes:
>[I didn't see evidence that his observations apply to other programs.]


At the RISC (er, risk) of beating to death something that should be on
`comp.arch': Dick Sites has been going around giving a talk whose
conclusions are similar in some ways.


His group built a tool that instruments DEC Alpha/AXP binaries for
Windows NT and also instruments Windows NT itself. They then ran the
binaries and gathered ``snapshots'' of instruction and data references
from the programs. The workload was SPEC92 and a TPS benchmark. Once
you throw in system references, the cache hit ratio of even SPEC
doesn't look so good. The TPS benchmark has an inner loop of several
hundred thousand instructions spanning multiple protection domains (a
friend of mine says that on the x86 the same benchmark has about
100,000 instructions per iteration of the inner loop). On the TPS
benchmark, three cycles of every four were spent servicing misses.


According to an acquaintance who studied caches in industry, most
database stuff has lousy cache behavior -- I and D -- and most other
applications (including, say, VLSI tools) are much closer to SPEC.
However, with sufficiently fast processors and sufficiently large
instructions, they become what Sites calls ``pin limited'' -- the
optimizations that make the most sense are those that shrink the code
size. Furthermore, the effect of periodic interrupts, system calls,
etc., tends to change the CPI noticably compared to the user-mode-only
reference stream.


(The above is all from memory, there are likely errors. I expect that
Sites' talk will be appearing at some point in paper/electronic form.)


To summarize: Pittman picked an extreme example, but the hardware
technology trends are making his observations more widely applicable
over time.


;-D on ( App. Lick Able ) Pardo
[Hmmn. Time to dust off those VAX manuals. -John]


--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.