Re: AST for several modules

"Rodney M. Bates" <rod.bates@wichita.boeing.com>
14 Jun 2000 12:44:28 -0400

          From comp.compilers

Related articles
AST for several modules Andreas.Ames@Tenovis.com (Andreas Ames) (2000-06-10)
Re: AST for several modules idbaxter@semdesigns.com (Ira D. Baxter) (2000-06-14)
Re: AST for several modules rod.bates@wichita.boeing.com (Rodney M. Bates) (2000-06-14)
Re: AST for several modules cfc@world.std.com (Chris F Clark) (2000-06-14)
Re: AST for several modules iank@idiom.com (2000-06-14)
Re: AST for several modules danwang+news@cs.princeton.edu (Daniel C. Wang) (2000-06-20)
Re: AST for several modules Andreas.Ames@Tenovis.com (Andreas Ames) (2000-06-20)
Re: AST for several modules joachim.durchholz@halstenbach.com.or.de (Joachim Durchholz) (2000-06-20)
Re: AST for several modules vugluskr@unicorn.math.spbu.ru (2000-07-18)
| List of all articles for this month |
From: "Rodney M. Bates" <rod.bates@wichita.boeing.com>
Newsgroups: comp.compilers
Date: 14 Jun 2000 12:44:28 -0400
Organization: The Boeing Company
References: 00-06-038
Keywords: parse

Andreas Ames wrote:
> I wonder if somebody could give me some hints about how to parse and
> build a single ( or several connected ) ASTs for several
> modules/files, e. g. for a whole program or software system.


I have done this in two different ways. One is to just build a single
AST in the obvious way for all the modules. The very topmost node
(sometimes I call it the superroot) has as children or close
descendents, the roots of the AS subtrees for modules.


> If one can build a single AST from several modules, are then the
> normal techniques for data and control flow analysis efficently
> applicable?


You can then pretty much use the usual techniques, adapting any
semantic cross links (e.g. identifier references pointing to their
declarations) to the shape of this super tree.


This technique requires that you do the scan/parse/AST-build from
source code, for each of the modules in the closure. The Gnat Ada
front end uses this method.


You might want the capability to have the AST for each module in
memory, or not, independently of other modules. This could be used to
allow memory to be freed for no-longer needed modules, or to load the
previously built AST for a module, rather than rebuilding it from
source.


I have done this by making indirect, all "pointers" which refer to a
node in a different module's AST. They point to a node (call it an
ExternalRef node) which belongs to the referring AST. This identifies
the target module in a way that is independent of whether its AST is
in memory, perhaps by a file name. It further identifies the node
within that AST in a way that is similarly independent, probably a
node number. (To have an AST be stored in a file, you will have to
have something like node numbers anyway, to denote pointers strictly
within that AST.) Finally, the ExternalRef node has a plain pointer
which leads to the target node when the target AST is in memory, or is
NIL otherwise. I have glossed over some detail here, but hopefully
you get the idea.


Many people are very reluctant to repeat the work of recompiling
a module to build a tree, preferring to keep "precompiled"
modules around that will be often be used by many client
modules. Over the years, I have grown continuously more
skeptical that this actually can bring any performance gain,
and I think it may often be a loss.


To be of much use, a (partially) compiled form needs to be
some kind of linked data structure. Saving linked data
structure in a file (sometimes called "pickling") requires
that the pointers be converted to some other form (e.g. node
numbers) when writing the file and back when reading.
Furthermore, these forms usually end up occupying 5 to 10
times the number of bytes as source code. It is very hard
to read and unpickle that many bytes any faster than just
recompiling the smaller number of source code bytes.


I have spent some effort in designing AST grammars for
compactness, and it always involves trading away convenience
in tree traversal. Also, there is a large volume of very
stereotyped, boring code to write, to do the pickling/unpickling.
You can avoid this by using a language which will do it
automatically (Modula-3 is the only example I am familiar with),
or a source code generator, which will generate the pickling
routines.


Rodney Bates


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.