Re: Symbol-Table and Parse-trees

cgbatema@undergrad.math.uwaterloo.ca (Cameron Bateman)
15 Aug 1999 10:45:24 -0400

          From comp.compilers

Related articles
Symbol-Table and Parse-trees blizzar@hem1.passagen.se (1999-08-12)
Re: Symbol-Table and Parse-trees rkrayhawk@aol.com (1999-08-13)
Re: Symbol-Table and Parse-trees delta-t@t-online.de (Leif Leonhardy) (1999-08-15)
Re: Symbol-Table and Parse-trees cgbatema@undergrad.math.uwaterloo.ca (1999-08-15)
| List of all articles for this month |
From: cgbatema@undergrad.math.uwaterloo.ca (Cameron Bateman)
Newsgroups: comp.compilers
Date: 15 Aug 1999 10:45:24 -0400
Organization: University of Waterloo
References: 99-08-048 99-08-064
Keywords: symbols, parse

To supplement what RKRayhawk said in the article 99-08-064,
where RKRayhawk <rkrayhawk@aol.com> wrote:




>The symbol table can be owned by lexical analysis! The lexer can at


...


>handed to the parser. This usually is a fact
>free_of_the_context_of_the_parse. That is, a chunk of text looks a lot
>like a symbol even before we try to fit it into the syntax.
>
>So sometimes, you will see a strategy where the lexer detects,
>instantiates, and recognizes references to symbols. Sometimes, the


As Rob points out, your choice of how to use the symbol table depends
alot on how your lexer, parser and semantic analysis engine interact.
In what I like to call the "lex/yacc" view of the world, the parser is
the king. That is, the parser is responsible for getting tokens from
the lexer and also triggering semantic analysis. There are other
(arguably better, depending on your situation) ways of viewing this
relationship, but since most parser generators that I know of work
this way, I'll assume that's what you are using.


Under the above relationship, it doesn't really matter who "owns" the
symbol table, as long as the semantic analyzer can see and modify it.
Now, if you use the "lex/yacc" model your semantic analyzer can work
in one of two ways:


1) You can use yacc's reduce actions and semantic stack to do your
semantic analysis in lock-step with your parsing; thus the parser must
be able to "see" the symbol table also.


2) You can use yacc's reduce actions to build a separate parse tree
structure, and then in subsequent passes, run an attribute grammar
evaluator over it. In this case the parser never "sees" the symbol
table.


Either way works. (1) is considered more "hacky" and generally more
limited than (2), but it is also easier to implement and understand if
you never used attribute grammars before (unless you have a handy
dandy automatic AG evaluator sitting around).




>Each symbol has a scope, meaning it is not visible forever. So you
>need to generate scope identifiers of some sort, and associate each


...


>In C there are global type variables that can be specified outside of
>functions, their scope ends when the current compilation


...


The best way I have come across to implement symbol tables for block
scope languages is as follows (credit for this goes to my compiler
prof, Gordon V. Cormack at Univ. of Waterloo; although I don't know if
he originated the idea):


Implement the table as a stack. Call the global scope "level 0".
Every global symbol you come across is simply pushed onto the stack.
Now, when you enter a new scope (i.e you enter a function definition
like main), you mark the current top-of-stack with either a pointer or
a dummy symbol entry. When the current scope closes, you pop all of
the symbols off of the stack until you reach the first scope
marker. Note also that when you do a search for a symbol, all you do
is start at the top of the stack and do a linear search until you find
the first occurrence of the symbol.


If you're really concerned about the linear search not being fast
enough, then you can add a secondary hash index that points to the top
most (inner most scope) instance of a symbol name and then thread a
list of all instances of that symbol name down through the stack.


I should note that while this method works well for languages like C
which have very simple scoping rules, it can be very nasty with
certain Algol-type languages where scoping rules can get *extremely*
elaborate and might not conform well to a single stack model.




>I am probably not good enough to take on the question about initializing
> int i = 3 * 2 + k;
>in a compact format like a newsgroup. But I will over you the hint that this
>excellent example could represent two VERY distinct challenges for you.
>


What you want to do, is have very specialized code in your semantic
analyzer for doing expression resolution. In particular:


1) Fold constants where possible. i.e in the example above, i = 3 * 2
+ k can be folded to either i = 6 + k if k is not a constant and into
a constant 6+k otherwise.


2) If possible, don't differentiate between assignments in statements
and in initialization: i.e use something like:


<init> ::= <type> <var>;
<var> ::= <var-name>; | <assign>
<var-name> ::= identifier
<assign> ::= <var-name> = <expr>;
<expr> ::= .....




If you are using semantic-stack style analysis, then you can have a
semantic record for the <expr> which can generate code for that
expression and then the <assign> record contains code that assigns the
result of <expr>'s code to the variable represented by the <var-name>.
This way, you can also use the <assign> and <expr> productions and
analysis logic for normal statements.


Hope this helps,


Cam
--
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
/ Cameron Bateman She is eternal, long before \
/ 4B MATH/CS Nation's lines were drawn, \
/ University of Waterloo When no flags flew, \


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.