Re: [Q] Annotating AST with flow info

David Chase <>
15 Apr 1998 23:27:50 -0400

          From comp.compilers

Related articles
[Q] Annotating AST with flow info maratb@CS.Berkeley.EDU (Marat Boshernitsan) (1998-04-13)
Re: [Q] Annotating AST with flow info (David Chase) (1998-04-15)
Re: [Q] Annotating AST with flow info (Robert Bernecky) (1998-04-15)
Re: [Q] Annotating AST with flow info (1998-04-18)
| List of all articles for this month |

From: David Chase <>
Newsgroups: comp.compilers
Date: 15 Apr 1998 23:27:50 -0400
Organization: Natural Bridge LLC
References: 98-04-051
Keywords: analysis, AST

Marat Boshernitsan wrote:

> I am looking for references (to both tools and papers) on techniques for
> annotating AST with control and data flow information.

> ... In particular, I
> am interested in methods that support incremental update, but I am
> curious about in non-incremental algorithms as well. (I am working on a
> language-independent framework for computing this information in a
> programming environment that uses AST for internal representation).

Get yourself over to Rice University's CS web pages, see what you can
find on the (variously mis-glyphed) Rn or R^n or R**n and/or Parascope
projects there. (Not much, in turns out.) The vectorizer (this was
about 15 years ago now, I feel old) used AST for its internal form,
and we computed various sorts of control flow and data flow
information from that AST (Favorite error message on detecting jumps
in and out of the body of a do loop: "Sleaze code detected,
optimization will be inhibited").

One of the things that I worked on was tools to support an AST-based
programming environment for Fortran. There were some theses relevant
to this that came out of there; I believe F.K. Zadeck did his
dissertation on incremental update of data flow info after program
changes (executive summary: deletion is hard) and L.M.E. Torczon did
hers on (I think) smart recompilation and/or reoptimization in the
presence of program updates. K.D. Cooper's was influenced by the idea
of whole programs being available for optimization, though I don't
think it was as directly relevant. M.A. Caplinger wrote most of the
actual editor/display stuff in the early versions of the system, and I
think his dissertation covered some of this. I'm not entirely sure
what Alan Carle wrote about, but I recall that he was studying
tree-to-tree diff or something like it.

One thing you should definitely do is keep your eyes open for mistakes
made in the early systems; several times, large hunks were thrown out
and redone from scratch. In particular, one result was that
programmers are not terribly interested in whatever weird syntactic
structure underlies their program, and they will ***NOT*** (notice the
emphasis?) use it if they are forced to edit expression as if they
were syntax trees. (This is especially true for C and Fortran.)

One major obstacle that I recall was dealing with new, untyped
identifiers; at some reprehensible step in the development of the PE,
I think the system would prompt you right then and there for the
appropriate declaration information. If I recall correctly, this was
repaired sometime downstream by Scott Warren and Don Baker. Dealing
with "error nodes" is (or was) a touch/interesting problem; users are
not especially interested in the truism that most programs, while they
are being edited, do not parse. One possibility for the error nodes
might be to use type systems other than those that come with the
language; for instance, type inference over error nodes might let you
know that two error types are related, or might let you infer the type
of something from its use (this is probably not possible in C++).

On the plus side, having something that would look up all the
parameters to subroutines in the various
IMSL/EISPACK/LINPACK/you-name-it libraries was nice. On the minus
side, simply browsing those libraries (remember this was about 15
years ago) was a medium-sized problem all by itself. Altavista would
have been a nice thing to have for that :-).

Also, the design was somewhat warped by the relatively high cost of
computing back then (we started on Sun 1s). Nowadays, brute force
makes many problems much simpler.

Also, the Cornell Program Synthesizer papers (Teitelbaum and Reps,
generally) looked into propagating updates automatically.

Also, at CMU, predating the Rice work by a little, was something
called "Gandalf". I don't remember much about Gandalf, except this
vague idea that all problems were addressed with "take two action
routines and call me in the morning".
David Chase,

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.