Basic-Block Profiling Isn't Always Accurate

larus@primost.cs.wisc.edu (James Larus)
Mon, 8 Mar 1993 22:49:38 GMT

          From comp.compilers

Related articles
Basic-Block Profiling Isn't Always Accurate larus@primost.cs.wisc.edu (1993-03-08)
Re: Basic-Block Profiling Isn't Always Accurate anik@crhc.uiuc.edu (1993-03-09)
Re: Basic-Block Profiling Isn't Always Accurate glew@pdx007.intel.com (1993-03-11)
Re: Basic-Block Profiling Isn't Always Accurate pohua@gomez.intel.com (1993-03-12)
Re: Basic-Block Profiling Isn't Always Accurate pardo@cs.washington.edu (1993-03-14)
| List of all articles for this month |

Newsgroups: comp.arch,comp.compilers
From: larus@primost.cs.wisc.edu (James Larus)
Keywords: architecture, performance, tools
Organization: University of Wisconsin, Madison -- Computer Sciences Dept.
Date: Mon, 8 Mar 1993 22:49:38 GMT

In porting QPT to the SPARC, we found a limitation on the accuracy of
basic-block profiling (which is the type performed by most profilers).
The problem is that blocks that end with an annulled conditional branch do
not always execute their last instruction. Assuming that the instruction
executes (as it would with a non-annulled conditional) leads to profiles
that are up to 5-10% high (on SPEC92 integer benchmarks). The only
general solution is to profile edges, not blocks, in the control-flow
graph [1].


Many machines have annulled conditional branches that squash (do not
execute) the instruction in a branch's delay slot when the branch is
either taken or not taken. On the SPARC (and most other machines), the
instruction in the delay slot is executed if the branch is taken and is
squashed if the branch falls through.


An edge profiler can easily determine the number of times that the
instruction in the delay slot executes by looking at the frequency on the
branch's "taken" edge. A block profiler cannot determine this value,
because, in general, block frequencies do not uniquely determine edge
frequencies.


For example, consider the code:


....
bne,a xxx
add %o1, %o1, 1
...
xxx: ....


The ADD instruction is in the delay slot of an annulled branch (SPARC
notation). It executes each time the branch jumps to label xxx. If no
other instruction jumps to label xxx or falls through into its block, a
profiler could determine how many times the ADD executes by looking the
execution frequency of basic block xxx. However, if these conditions do
no hold, the profiler can't determine how many times the ADD executes.


The effect of this problem on an instruction profile depends on the
frequency of annulled branches and the size of basic blocks. In
non-numeric programs, with small blocks, the effect can be surprisingly
large.


/Jim


James Larus
Computer Sciences Department
University of Wisconsin
1210 West Dayton Street
Madison, WI 53706
larus@cs.wisc.edu
608-262-9519




[1] Thomas Ball and James R. Larus, "Optimally Profiling and Tracing
        Programs," ACM SIGPLAN-SIGACT Principles of Programming Languages
        (POPL), January 1992, pp. 59-70.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.