Re: any performance profiling tools ??

cfc@world.std.com (Chris F Clark)
1 Oct 1997 23:45:18 -0400

From comp.compilers

Related articles
any performance profiling tools ?? chunghsu@george.rutgers.edu (1997-09-23)
Re: any performance profiling tools ?? dccoote@werple.mira.net.au (1997-09-28)
Re: any performance profiling tools ?? everhart@gce.com (Glenn Everhart) (1997-09-30)
*Re: any performance profiling tools ?? cfc@world.std.com* (1997-10-01)**
Re: any performance profiling tools ?? steve@blighty.com (1997-10-01)
Re: any performance profiling tools ?? ok@cs.rmit.edu.au (1997-10-02)
Re: any performance profiling tools ?? sanjay@src.dec.com (1997-10-02)
Re: any performance profiling tools ?? jason@reflections.com.au (Jason Patterson) (1997-10-03)

| List of all articles for this month |

From:	cfc@world.std.com (Chris F Clark)
Newsgroups:	comp.compilers,comp.arch,comp.benchmarks
Date:	1 Oct 1997 23:45:18 -0400
Organization:	The World Public Access UNIX, Brookline, MA
References:	97-09-084 97-09-119 97-09-126
Keywords:	performance, testing

> My understanding of ATOM is that it works at the executable or object
> layer...

Yes, it takes an executable and the related shared libraries (which
the atom documentation calls objects, just to confuse us who think of
objects as things which have not been linked yet) and instruments
them.

> What ATOM does is global optimization and I believe some
> customization for particular alpha chip. I've heard of 20% gains in
> execution speed through the thing, though YMMV as usual...

Actually, that is OM a related but different tool. However, the basic
technology is the same.

> [Do any of these address the original question about profiling at a
> low enough level to count cache misses and the like? -John]

The original question seemed to want to be able to catch cache misses
without simulation (which I take to mean at the sub-instruction level
using some sort of on-chip counter). There are atom (based) tools
(see below for what that means) which track cache misses, but they all
work by simulation. Each instruction which accesses memory is
instrumented and a record is kept of the actual addresses which are
accessed (as the program runs). A second phase of each cache miss
tool then analyzes the addresses and simulates the contents of the
cache (counting each cache miss that occurs in the simulation). The
method is as accurate as the simulation is, but by its nature misses
certain cache misses, those misses caused by the cache being changed
in other processes or in the kernel, as it only can simulate the
instructions in your process. On the plus side, it is good enough for
banchmarking and chip design questions. You can simulate cache
designs long before commiting them to silicon and see their exact
effect on the code you care about.

An "actual" cache miss counter would require hardware support. I
don't know which chips (if any) provide such support. The basic
problem is that when a cache miss occurs the instruction is stalled,
and the chip must count the number of cycles the instruction is
stalled for and present those as a counter (register). Now, there is
a tool on DEC systems called kprofile which can access the on-chip
profiling counters and if one of the counters tracks the desired cache
miss information, then you are in luck. I presume that other
architectures have similar tools.

Above I mentioned the term "atom based tool", wich I feel merits some
explanation. Atom is an "enabling" technology, not a tool in its own
right. An atom based tool is a program someone wrote which exploits
the atom technology. The purpose of an atom based tool is to take
measurements of application programs and do something with that data.

The user of an atom based tool feeds their application to the
atom/atom tool combination and gets out a modified application. When
that modified application is run, it collects the desired data. The
data is then analyzed to provide the answers the user requests.

The implementation of this is done by some real clever
sleight-of-hand.

Atom has a bunch of code which can read, understand, modify, and write
executable files. This code is effectively collected into an
executable editing library (via the atom api). The atom tool writer
uses that api to construct a program (the atom tool "instrumentation
phase") which figures out which instructions in an application need to
be modified and what measurements should be take at each instruction.

When atom runs, it incorporates the instrumentation phase of the atom
tool into itself, and then reads the application, feeds it to the
instrumentation phase, and writes out the modified application.

There is a second part of the atom tool called the "analysis phase"
which consists of subroutines which perform the desired measurements
and od whatever is desired with the measurements. The atom api allows
the instrumentation phase to insert calls to these subroutines at any
point in the code (in fact that is essentially the only code
modification which the api allows, which is why it is not an optimizer
per se). The impressive thing about atom is that you can insert these
subroutine calls into the middle of almost any code sequence (even
prolog and epilog code, although you can't put it in the middle of
hardware lock sequences where the hardware needs the two instructions
to be sequential for correct semantics, but that isn't a big
restriction) and it will preserve the appropriate registers, create a
stack frame, etc. and restore everything afterward so that the
resulting application is totally unaffected.

Thus, to write a cache tool, you write these two parts. First, you
figure out which instructions can effect the cache in your model. You
write an instrumentation phase which tells atom to instrument those
instructions. Second, you write an analysis phase which simulates the
effect of the instructions on your cache model. The result is that
your model is given the exact stream of instructions which are
executed by the application and you can calculate the exact effects
you are trying to model.

Disclaimer: I consult for DEC and work on "third degree" an atom based
tool that does memory leak detection (and realted checks).

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. CompuServe : 74252,1375
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: any performance profiling tools ??

cfc@world.std.com (Chris F Clark)1 Oct 1997 23:45:18 -0400

cfc@world.std.com (Chris F Clark)
1 Oct 1997 23:45:18 -0400