Re: Q: How to preserve symbol table info through multiple pass compilation?

Jerry Leichter <leichter@smarts.com>
4 May 1997 00:30:33 -0400

From comp.compilers

Related articles
Q: How to preserve symbol table info through multiple pass compilation jlee@espresso.ucsd.edu (Justin Lee) (1997-04-22)
Re: Q: How to preserve symbol table info through multiple pass compila cfc@world.std.com (1997-04-30)
*Re: Q: How to preserve symbol table info through multiple pass compila leichter@smarts.com (Jerry Leichter)* (1997-05-04)**
Re: Q: How to preserve symbol table info through multiple pass compila bduncan@tiac.net (Bruce Duncan) (1997-05-08)

| List of all articles for this month |

From:	Jerry Leichter <leichter@smarts.com>
Newsgroups:	comp.compilers
Date:	4 May 1997 00:30:33 -0400
Organization:	System Management ARTS
References:	97-04-153
Keywords:	symbols

| I'm currently writing a multiple pass (tree-based) compiler. So far
| I've accomplished the following:
|
| 1) lexed input into tokens
| 2) parsed tokens and created a parse tree
|
| Next I will accomplish the semantic checking and code gen with
| multiple passes on the parse tree.
|
| My question is this: what is the best way to structure a symbol table
| so that the scope relationships are preserved/updated through multiple
| passes of the parse tree? ...

I can tell you how I organized things in one compiler. The thing to
realize is that there are two quite different things you need to be
able to do with a "variable":

1. While parsing, given the name of a variable, find the
data retained about it.
2. While walking the tree, given a "variable reference" node,
find the data the compiler has retained about the
variable.

Once you've got the tree built, there's no need to take a variable
name and find the information stored about it any more.

So, split the "symbol table" into two pieces. Call them the name map
and the variable data table. The variable data table contains the
compiler-generated information that defines a single variable - type,
size, where allocated in memory, etc. This information refers to a
variable *as it will exist at run time*; it has nothing to do with how
things were *named* in the text. If x exists in two scopes, there are
two entries in the variable data table, one for each scope, exactly as
if there were two variables, x and y, in the same scope.

The name map simply takes a variable name and maps it to an entry in
the variable data table. The mapping is *scope dependent*.

The compiler I did (an instructional compiler for a subset of
Modula-2) does this in a way that was perhaps not maximally efficient,
but is clear and easy to understand. Pass 1 of the compiler builds an
AST, but had no semantic content; a "variable name" or "function name"
leaf node pointed logically into the name map. Any two instances of
"x" in the tree pointed to the same place. At this point, the
variable data table is empty.

Now, walk the tree from the top down. As you enter each block, walk
its variable declarations. From a variable declaration, construct a
new variable data table element. Push a pointer to that element onto
the front of a stack of references in the name map entry. Then walk
the rest of the block. (If you allow declarations anywhere, you just
have to keep jumping back an forth between these two modes of
operation, always processing a declaration before any of the scope to
which it applies.) When you see a "variable name" leaf node, look at
the name map. If its reference stack is empty, the variable is
undeclared; report an error. Otherwise, replace the "variable" leaf
node with a "reference" node, pointing to the variable data table
element found at the top of the stack. (This is the one from the
innermost declaration - normally the one you want.) At the end of a
block, walk the name table and pop off any references pushed onto the
reference stack for the current block. The variable data table
elements remain unchanged.

When you're done walking the entire AST, there should be no "variable
name" or "function name" nodes left - they should all have been
replaced by "reference" nodes, now pointing to variable data table
elements.

In reality, there are all sorts of additional things to deal with.
For example, in error messages, you may need to refer to the variable
name even after you're re-written the "variable name" nodes to
reference nodes. So the variable data table elements should point
back to the names declared for them. Non-nested scopes - e.g., C
structs, or things like namespaces in C++ - you need to deal with them
in a more general way than the simple stacking discipline outline
above. This can be more or less complex, depending on how common such
scopes are, and how large.

Still, the important principle is that the name map has entries
corresponding to textual names in the source text, and the variable
data table has entries corresponding to "locations" that will be used
at run time. The correspondance between these depends on where you
are in the code, and only needs to be maintained for a fairly short
time.

-- Jerry
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Q: How to preserve symbol table info through multiple pass compilation?

Jerry Leichter <leichter@smarts.com>4 May 1997 00:30:33 -0400

Jerry Leichter <leichter@smarts.com>
4 May 1997 00:30:33 -0400