Re: C Compiler in C++

thp@cs.ucr.edu
12 May 2002 00:07:26 -0400

          From comp.compilers

Related articles
C Compiler in C++ stanarevic@hotmail.com (2002-05-08)
Re: C Compiler in C++ loewis@informatik.hu-berlin.de (2002-05-12)
Re: C Compiler in C++ dnovillo@redhat.com (Diego Novillo) (2002-05-12)
Re: C Compiler in C++ journeyman@compilerguru.com (2002-05-12)
Re: C Compiler in C++ thp@cs.ucr.edu (2002-05-12)
Re: C Compiler in C++ rbates@southwind.net (Rodney M. Bates) (2002-05-13)
Re: C Compiler in C++ lex@cc.gatech.edu (Lex Spoon) (2002-05-13)
Re: C Compiler in C++ alexc@world.std.com (2002-05-17)
Re: C Compiler in C++ alexc@world.std.com (2002-05-17)
Re: C Compiler in C++ journeyman@compilerguru.com (2002-05-17)
Re: C Compiler in C++ Bart.Vanhauwaert@nowhere.be (2002-05-17)
[2 later articles]
| List of all articles for this month |

From: thp@cs.ucr.edu
Newsgroups: comp.compilers
Date: 12 May 2002 00:07:26 -0400
Organization: University of California, Riverside
References: 02-05-039
Keywords: C, OOP, design
Posted-Date: 12 May 2002 00:07:26 EDT

Nemanja Stanarevic <stanarevic@hotmail.com> wrote:
: Dear colleagues -
: I am writing a complete C compiler in C++ as an academic exercise and
: I need an advice related to the design of parse tree data structure
: for C declarations.
[...]
: Does anyone know of a good way (or generally accepted way) of
: representing the parse tree for C declarations using a similar class
: hierarchy?


I can tell you of a way that worked well for me and the student in my
compiler class, who wrote Lex/Yacc-based compilers for the Tiger
language (per the appendix of Appel's book).


Yacc (i.e., Bison) generated the tree somewhat mindlessly. Then we
made two top-down passes. The first pass did semantic checking and
decorated the nodes with all of the information needed for the second
pass, which generated intermediate code (essentially quadruples
expressed in a notation that is a subset of C).


There was a base class Node declared (pretty much) as follows:


    struct Node { // level-one base class
        Node() : lineNo(yylineno), nextTok(yytext) {}
        virtual ~Node() {};
        string pos() {
            return "At symbol \"" + nextTok + "\" on line " + itoa(lineNo) +",\n";
        }
        virtual string show() = 0;
        virtual void typeCheck() = 0;
        virtual void codeGen() = 0;
    private:
        int lineNo;
        string nextTok;
    };


Note that we lifted diagnostic stuff up to this level, and we made our
passes by calling typeCheck (respectively, codeGen) on the root node
and letting each interior node call typCheck (respectively, codeGen)
on its children.


For each nonterminal there was a level-two base class derived from
Node, For each rule having a given nontermainal as its left-hand side,
there was a leaf class derived from that nonterminal's level-two base
class. The leaf class's constructor would corresponding exactly to
the right hand side of the rule Yacc entries looked like:


    Widget : foo1 foo2 foo3 { $$ = new FooWidget($1,$2,$3); }
                  | bar1 bar2 { $$ = new BarWidget($1,$2); }
                  ;


The constructors dumped the significant data from $1, $2, ... into
constants inside the nodes. (We weren't rigid in following our
tactics; for instance, we reused our IfElseStmt class for
if-statements, setting the last two arguments to 0.)


The symbol table was filled in during the first pass. With each
identifier it associated a pointer to that symbol's declaration in the
tree. A variables attributes were held in data members of the
corresponding tree node.


We departed from this strategy where the grammar was designed to
construct lists. In those cases we simply constructed the
corresponding STL list rather than letting it get encoded into a long
branch of the tree.


We also departed from the strict division into passes because
sequences of declarations required multiple passes, one to install
identifiers into symbol tables and another to actually do
typechecking.


Obviously C is a much more complex language, and you may (and probably
will) run into some difficulties trying to follow this approach
rigidly, but I don't see any "show stoppers". For example, IIRC,
there are places in C where declarations must be processed in order to
continue the parsing correctly, i.e., the scanner needs to consult the
symbol table to determine what kind of a token to return. But I don't
see any difficulty in say typechecking certain subtrees while the
overall tree is still under construction.


Tom Payne


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.