Parsing C typedefs

drw@math.mit.edu (Dale R. Worley)
Wed, 15 Jan 1992 18:14:39 GMT

          From comp.compilers

Related articles
Handling the typedef problem with a modifiable grammar bburshte@pyrps5.eng.pyramid.com (1992-01-09)
Parsing C typedefs drw@math.mit.edu (1992-01-15)
| List of all articles for this month |
Newsgroups: comp.compilers
From: drw@math.mit.edu (Dale R. Worley)
Keywords: C, parse, yacc
Organization: MIT Dept. of Tetrapilotomy, Cambridge, MA, USA
References: 92-01-043
Date: Wed, 15 Jan 1992 18:14:39 GMT

[To the moderator: Even though the topic has been closed, I hope that
this article will be accepted anyway, since it describes how these
problems were solved in a production compiler that handled all the
tricky cases of ANSI C.]
[This is the absolute definite last word on parsing typedefs. Really. -John]


I worked on the parser for the Compass C compiler. The solution we used
to the typedef vs. identifier problem was to have the scanner return two
different token types, based on the symbol table. In order to make this
work correctly, a few things have to be done carefully:


An identifier has to be inserted into the symbol table when the
punctuation following its declarator (',', ';', or '=') is read. This
is not particularly tricky, since an LALR(1) parser never has to look
ahead at that point, and our parser looks ahead only when needed.


Making sure that tricky declarations like


typedef int foo;
foo;
foo x;
volatile foo;
int foo;


are handled correctly is done by having the grammar of declarations
enforce the rule that "if a typedef name is used as a declarator, the
declaration-specifiers of the declaration must contain a type-specifier".
Thus, if "foo" is a typedef, then


volatile foo;


is an empty declaration (and is illegal), while


int foo;


is a redeclaration of "foo" as an integer variable. To do this, the
grammar must keep track of whether a type-specifier has been seen in the
declaration-specifiers as it reads the next token and tries to decide
whether it is another declaration-specifier or a declarator. Thus you
have grammar rules like:


declaration-specifiers-without-type-specifier ::=
declaration-specifiers-without-type-specifier type-modifier


declaration-specifiers-with-type-specifier ::=
declaration-specifiers-with-type-specifier type-modifier
| declaration-specifiers-without-type-specifier type-specifier
| declaration-specifiers-with-type-specifier type-specifier


declaration ::=
declaration-specifiers-without-type-specifier
non-typedef-declarator
| declaration-specifiers-with-type-specifier
non-typedef-declarator
| declaration-specifiers-with-type-specifier
typedef-declarator


If the grammar is aware of these distinctions, it can be made LALR(1).
This multiples the grammar rules, of course, but this is not the only
place that a practical C grammar has several clones of a single rule in
the Standard's grammar.


Dale Worley Dept. of Math., MIT drw@math.mit.edu
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.