Related articles |
---|
Basic-Block Profiling Isn't Always Accurate larus@primost.cs.wisc.edu (1993-03-08) |
Re: Basic-Block Profiling Isn't Always Accurate anik@crhc.uiuc.edu (1993-03-09) |
Re: Basic-Block Profiling Isn't Always Accurate glew@pdx007.intel.com (1993-03-11) |
Re: Basic-Block Profiling Isn't Always Accurate pohua@gomez.intel.com (1993-03-12) |
Re: Basic-Block Profiling Isn't Always Accurate pardo@cs.washington.edu (1993-03-14) |
Newsgroups: | comp.arch,comp.compilers |
From: | larus@primost.cs.wisc.edu (James Larus) |
Keywords: | architecture, performance, tools |
Organization: | University of Wisconsin, Madison -- Computer Sciences Dept. |
Date: | Mon, 8 Mar 1993 22:49:38 GMT |
In porting QPT to the SPARC, we found a limitation on the accuracy of
basic-block profiling (which is the type performed by most profilers).
The problem is that blocks that end with an annulled conditional branch do
not always execute their last instruction. Assuming that the instruction
executes (as it would with a non-annulled conditional) leads to profiles
that are up to 5-10% high (on SPEC92 integer benchmarks). The only
general solution is to profile edges, not blocks, in the control-flow
graph [1].
Many machines have annulled conditional branches that squash (do not
execute) the instruction in a branch's delay slot when the branch is
either taken or not taken. On the SPARC (and most other machines), the
instruction in the delay slot is executed if the branch is taken and is
squashed if the branch falls through.
An edge profiler can easily determine the number of times that the
instruction in the delay slot executes by looking at the frequency on the
branch's "taken" edge. A block profiler cannot determine this value,
because, in general, block frequencies do not uniquely determine edge
frequencies.
For example, consider the code:
....
bne,a xxx
add %o1, %o1, 1
...
xxx: ....
The ADD instruction is in the delay slot of an annulled branch (SPARC
notation). It executes each time the branch jumps to label xxx. If no
other instruction jumps to label xxx or falls through into its block, a
profiler could determine how many times the ADD executes by looking the
execution frequency of basic block xxx. However, if these conditions do
no hold, the profiler can't determine how many times the ADD executes.
The effect of this problem on an instruction profile depends on the
frequency of annulled branches and the size of basic blocks. In
non-numeric programs, with small blocks, the effect can be surprisingly
large.
/Jim
James Larus
Computer Sciences Department
University of Wisconsin
1210 West Dayton Street
Madison, WI 53706
larus@cs.wisc.edu
608-262-9519
[1] Thomas Ball and James R. Larus, "Optimally Profiling and Tracing
Programs," ACM SIGPLAN-SIGACT Principles of Programming Languages
(POPL), January 1992, pp. 59-70.
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.