Re: Debugging of optimized code

bill@amber.ssd.csd.harris.com (Bill Leonard)
Fri, 13 Jan 1995 17:24:22 GMT

          From comp.compilers

Related articles
Debugging of optimized code SAND_DUANE@tandem.com (1995-01-09)
Re: Debugging of optimized code brent@jade.ssd.csd.harris.com (1995-01-23)
Re: Debugging of optimized code bill@amber.ssd.csd.harris.com (1995-01-13)
Re: Debugging of optimized code milt@Eng.Sun.COM (Milton Barber) (1995-01-23)
Re: Debugging of optimized code snl@RIF.ndim.edrc.cmu.edu (Sean Levy) (1995-01-23)
Re: Debugging of optimized code conway@munta.cs.mu.OZ.AU (1995-01-24)
Re: Debugging of optimized code copperma@grenoble.rxrc.xerox.com (1995-01-24)
Re: Debugging of optimized code danhicks@aol.com (1995-01-26)
Re: Debugging of optimized code baynes@ukpsshp1.serigate.philips.nl (1995-01-26)
[19 later articles]
| List of all articles for this month |
Newsgroups: comp.compilers
From: bill@amber.ssd.csd.harris.com (Bill Leonard)
Keywords: optimize, debug
Organization: Harris Computer Systems, Ft. Lauderdale FL
References: 95-01-036
Date: Fri, 13 Jan 1995 17:24:22 GMT

SAND_DUANE@tandem.com writes:
> A major irritant with all the optimizing compilers I know of, is that
> symbolic source-level debugging of the optimized version of released
> products becomes impossible or very very flaky.


This is a very good question, one which seems to be coming up more often.


> The programmers at my company (Tandem Computers) are very frustrated
> with this. With our prior range of stack-oriented CISC machines, the
> compilers could do so little to the code that they didn't bother to
> do much, and so symbolic debugging support and post-mortem analysis
> was excellent in all released products. With the MIPS compilers we
> now use, anyone trying to recreate a problem in its original context
> needs to learn how to puzzle out what the compiler did across an
> entire procedure, at the machine level. The standard answer seems
> to be to rebuild the product with ALL optimizations off, and try to
> recreate the customer's problem in that version.
>
> Is anyone doing much better than this?


Depending on how you interpret "much better", the answer could be yes. For
instance, Harris Computer Systems compilers and debuggers can at least give
an accurate indication of where you are in the program, even in the
presence of instruction scheduling and complex optimizations. Furthermore,
they are designed to do even better, it just takes time to implement the
improvements.


> Is the industry satisfied with this?


I hope the answer is no. We certainly are not. I think the current state
is more due to allocation of priorities than satisfaction. Customers still
tend to be interested in more performance or a compiler for the latest chip
than they are for better debugging support of optimized code. And as long
as new chips keep coming out at the current rate, compiler writers will be
kept pretty busy just cranking out compilers for the next architecture.


> I'm interested in hearing
> * Are improvements in this area practical?


Very much so.


> * Are there any production compilers now in use or under further
> development, which support reliable symbolic debugging of
> maximally or not-quite-maximally optimized code?


As I say, our compilers (Harris Computer Systems) and debuggers are
designed to generate and understand debug information that describes the
optimized code about as well as anything could. They take advantage of the
DWARF2 debug information standard, which is very flexible and powerful in
describing what the compiler has done.


Current releases of our compilers support accurate line-number information
for generated code, so that the debugger can always know exactly where you
are. They also produce accurate stack tracebacks, unless you stop the
program in the middle of a procedure prologue or epilogue, which is an
unusual thing to do. (DWARF2 can handle even these cases, but we don't
currently support that feature.) Accurate information about variables, if
they have been allocated in different locations at different spots in the
code, is not presently generated, but could be. These are just a couple of
examples.


One major issue that needs to be addressed, however, is that of space.
Object files and executables will get larger as the compiler has to produce
more debug information to describe the program. DWARF2 makes every effort
to minimize space, but one can only go so far.


> * Are we fundamentally limited by the existing "standard" codefile
> debugger info formats?


Certainly existing debug info formats are a limiting factor. COFF, for
instance, cannot possibly describe optimized code without drastic
extensions, for which there is no standard. DWARF2 is the only format I
have seen that is adequate for that purpose.


> * Is DWARF2, whatever that is, going to solve that problem? When?


By itself, no, but it (or something like it) is a necessary enabling
technology. As for when, it is already here in a few compilers, but it
will be some years yet, I expect, before compilers take full advantage of
all the features of DWARF2. Will DWARF2 ever appear on PC-class machines?
Doubtful.


> * What are some clear ways of describing the "current location
> in code" in source terms, when code motion and cross-statement
> scheduling has blurred the statement boundaries?


DWARF2 has the concept of a "portion of a line". When the code for a
statement has been interspersed with code from other statements, the debug
info flags the first instruction for the line as the beginning; the other
instructions are just flagged as belonging to that line. So the debugger
can tell where the beginning of the code for that line is.


A more interesting question is whether debugger functionality needs to
become more sophisticated. For instance, does single stepping by line make
much sense when the code for that line has been interspersed among the code
for several other lines? It is really hard to communicate to the user
exactly what part of the line has been executed, so what is the right
debugging paradigm here?


> * Just how much added performance is gained, by those code
> optimizations that make source-level debugging particularly hard?


Of course, that depends on the code, the compiler, and the target
architecture, but gains of 50% or more are not all that uncommon.
Unrolling a loop on a RISC machine, for instance, may easily gain 50% in
performance.


> For the 10% of our code that determines system-level performance and
> system prices, and hence our profits and salaries, Tandem's
> programmers are willing to forgo symbolic debugging in exchange for
> highest possible performance. But for the other 90% of our shipped
> code that executes less often, Tandem would be very happy to trade
> 10-20% of the performance of those pieces, in exchange for reliable
> symbolic debugging (and for more reliable compilers). Our customers'
> application programmers would, also.


But now you get into all sorts of issues involving building and maintaining
the application. First of all, how do you partition your source code? Some
programmers put one function per source file, some group the functions into
"modules", etc. For languages like C++ or Ada, it often makes sense to put
all the source for a class or package in the same source file. But then what
if only one of those functions really needs optimized?


Even if you get the source partitioned so that you can optimize only part,
you have to have build procedures that will keep track of what is supposed
to be optimized and what is not. Will your maintenance programmers
understand them?


Then there is the question of how long the list of optimized functions
remains accurate. As the application is maintained, new code is added, old
code is modified or deleted. Very likely the code that used to take lots
of time doesn't anymore.


All of these issues conspire to make it much easier to just optimize the
whole thing or nothing at all.


> Clearly, code subjected to inter-statement common subexpr elimination
> can't support debug-time modification of variables, or manually forced
> branches to arbitrary unlabelled statements.


Even manually forced branches to labelled statements can cause a problem,
because the compiler made certain assumptions about what would be in which
registers at that point.


> I'm asking for reliable debugging support of these essentials:
> * Where am I, in the program?
> * Stmt breakpoints are possible at some (if not most) statement
> boundaries, with at least one point avaiable in each basic block.
> * Displaying of variable values, with warning if the value is
> not current for the current claimed stmt location.
> * Stack traceback of callers.


All of these are possible today, but require a more sophisticated debug
information format than, say, COFF.


--
Bill Leonard
Harris Computer Systems Corporation
2101 W. Cypress Creek Road
Fort Lauderdale, FL 33309
Bill.Leonard@mail.hcsc.com
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.