From: | dwight@pentasoft.com (Dwight VandenBerghe) |
Newsgroups: | comp.compilers |
Date: | 7 May 1998 16:52:53 -0400 |
Organization: | All USENET -- http://www.Supernews.com |
References: | 98-05-004 |
Keywords: | semantics, syntax |
On 4 May 1998 22:54:49 -0400, Neal Harder <nealh@dinigroup.com> wrote:
> ...My problem
>is that I am having trouble effectively traversing the parse tree to
>perform semantic checking (in a second pass). ..
First, note that your parse tree structure can be a source of a lot of
trouble until you get it right. In contradistinction to what some of
the texts may tell you, don't make the parse tree, or abstract syntax
tree, reflect your grammar. The requirements of the grammar have to
do with yacc and lalr(1) constraints and factoring out common pieces
for your own understanding and so forth. The abstract syntax of your
language may be sort of like your non-terminals, but it may not be,
either. It is the meta-syntax of your language, and it represents a
sort of ideal: the ideal form that sentences in your language
represent, regardless of how you happen to have chosen to express them
in your particular choice of syntax, or grammar productions.
Getting the AST "right" means that you've worked with it until it has
boiled down into its essentials. You have nearly the minimum number
of tree nodes necessary, and the contents of each node are just what's
needed to represent what your language can express. There usually
aren't very many such nodes or operators, as compared to the concrete
syntax (the way that the expressions are actually entered by the
user).
So first of all, I suspect that a lot of the cumbersome work that
you're up against, and the inefficiency, and the klunkiness of it all,
has a lot to do with having a noisy AST representation. Clarify that,
and I think that you'll find that it really isn't so bad to use the
tree for your type checking.
Secondly, there are two places where semantic information is usually
held: in the AST, and in the symbol table. You keep everything that
has to do with a variable or function itself in the symbol table, and
you keep everything that has to do with the use of a variable or
function in the AST. The normal way to segment out the work is to
fill in as much as you can into the symbol table as soon as possible,
because this generally helps you along in the rest of the compiler
later. (For example, immediately recording the type and name of a
typedef in the symbol table makes parsing the next statement easier
than if you waited until a later pass. And yes, typedefs and struct
names and so forth also go into the symbol table.) But opinions vary
as to whether to try to mix expression and function-argument type
checking along with AST construction. On the one hand, it usually
(not always; depends on the language) is possible to either
completely, or almost completely, check types during the parse itself.
On the other hand, mixing the type checking and the parsing can lead
to confusion, as they are two different things, in the end. I tend to
type-check at parse time for simple languages with straight-forward
parsers, and to defer it to a second pass for more complex languages.
Part of the reason to defer type-checking is that you normally have to
alter the AST to reflect things like implicit casts and short-
circuiting. It just gets complicated to try to build the AST and
insert casts etc all at the same time.
By the way, I *never* use what you are calling a parse tree. And I
never use an AST that mirrors the non-terminals in the grammar. My
ASTs tend to look sort of like Scheme, like lambda-notation S-records
... in other words, like true abstract syntax. A typical design might
be something like this... an AST is comprised of records that all have
the following form:
op : enum (PLUS, MINUS, IF, IFELSE, WHILE, BLOCK, ASSIGN,
VARIABLE, ...}
loc : source coords (file, line, col)
type : enum { INT, FLOAT, DOUBLE, CHAR, ...}
ref : if INT constant then value
if FLOAT or DOUBLE constant then index into constant table
if CHAR constant then index into string table
if VARIABLE then index into symbol table
etc...
children: variable-length list of pointers to this node's
subtrees, if any.
Finally, I do not envy your task, especially due to the language that
you're using (C++). Although I've used C++ every day of my work life
for four years now, I grow less and less fond of it, especially for
what you're attempting. If you must use C++, then take a look at
Yacc++ (from Compiler Resources) - it takes away some of the agony.
Strangely, in this day and age of good free language tools, it's
actually worth the $995 ticket price. But if you have any degree of
freedom and can choose another language, you might find Objective Caml
worth a look, or perhaps even the Gentle compiler construction system.
OCaml is simply fabulous, but might be too much of a stretch for your
employer. Gentle has the nice feature of sitting on top of flex and
yacc, yet being a full-fledged compiler-writer's language. It can
automate the kind of semantic checking you're discussing (it has, for
example, a "sweep" command that runs a constraint against a tree).
Both of these are quite enjoyable to use. Yacc++ is about as good as
it gets for raw C++, though.
And good luck. It's not an easy thing that you're attempting. Expect
to learn a lot; even though many CSci types consider compilers to be a
solved problem, that doesn't mean it's a trivial one. You do it over
and over and you start to get the hang of it, like anything else.
Dwight
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.