Related articles |
---|
Optimizations to improve icache hit rate Jeffrey.Kenton@comcast.net (Jeff Kenton) (2004-04-21) |
Re: Optimizations to improve icache hit rate pohjalai@cc.helsinki.fi (A Pietu Pohjalainen) (2004-04-28) |
Re: Optimizations to improve icache hit rate anton@mips.complang.tuwien.ac.at (2004-04-29) |
From: | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
Newsgroups: | comp.compilers |
Date: | 29 Apr 2004 12:00:05 -0400 |
Organization: | Institut fuer Computersprachen, Technische Universitaet Wien |
References: | 04-04-068 |
Keywords: | design, optimize |
Posted-Date: | 29 Apr 2004 12:00:05 EDT |
Jeff Kenton <Jeffrey.Kenton@comcast.net> writes:
>I had a discussion during an interview about optimizations designed to
>decrease icache misses. It seems to me that code re-arrangement will
>give small benefits at most, and that general optimizations,
>especially those that decrease overall code size, will be more
>productive in the end.
Counterevidence:
- Mueller and Whalley [mueller&whalley92] found that an optimization
that increases code size by 50%, but increases spatial locality,
reduces the number of I-cache misses even for very small I-caches
(IIRC they simulated 2KB caches).
- We tried to use partial replication to reduce the code size and the
I-cache misses resulting from full replication [ertl&gregg03] without
losing most of its branch prediction accuracy advantage. We found
that the code size was reduced significantly, but the number of
I-cache misses increased, again due to worse spatial locality.
@InProceedings{mueller&whalley92,
author = "Frank Mueller and David B. Whalley",
title = "Avoiding Unconditional Jumps by Code Replication",
booktitle = "{SIGPLAN} '92 Conference on Programming Language
Design and Implementation",
year = "1992",
pages = "322--330",
annote = "Nearly all unconditional jumps can be eliminated by
code replication. This expands compiled C code by about
50\%, but reduces the number of executed instructions
and the cache work (miss ratio increases slightly).
There are some nontrivial problems in this technique,
which are solved heuristically in this paper."
}
@InProceedings{ertl&gregg03,
author = "M. Anton Ertl and David Gregg",
title = "Optimizing Indirect Branch Prediction Accuracy in
Virtual Machine Interpreters",
booktitle = "{SIGPLAN} '03 Conference on Programming Language
Design and Implementation",
year = "2003",
URL = "http://www.complang.tuwien.ac.at/papers/ertl%26gregg03.ps.gz",
abstract = "Interpreters designed for efficiency execute a huge
number of indirect branches and can spend more than
half of the execution time in indirect branch
mispredictions. Branch target buffers are the best
widely available\mn{on all recent general-purpose
machines?} form of indirect branch prediction; however,
their prediction accuracy for existing interpretes is
only 2\%--50\%. In this paper we investigate two
methods for improving the prediction accuracy of BTBs
for interpreters: replicating virtual machine (VM)
instructions and combining sequences of VM instructions
into superinstructions. We investigate static
(interpreter build-time) and dynamic (interpreter
run-time) variants of these techniques and compare them
and several combinations of these techniques. These
techniques can eliminate nearly all of the dispatch
branch mispredictions, and have other benefits,
resulting in speedups by a factor of up to 3.17 over
efficient threaded-code interpreters, and speedups by a
factor of up to 1.3 over techniques relying on
superinstructions alone."
}
- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/home.html
Return to the
comp.compilers page.
Search the
comp.compilers archives again.