Trying to parse C++

Jim Roskind <>
11 Apr 1996 23:34:58 -0400

          From comp.compilers

Related articles
HELP: How to parse forward references in C++ classes (Tomas Telecky) (1996-04-10)
Trying to parse C++ (Jim Roskind) (1996-04-11)
| List of all articles for this month |

From: Jim Roskind <>
Newsgroups: comp.compilers
Date: 11 Apr 1996 23:34:58 -0400
Organization: Compilers Central
References: 96-04-060
Keywords: C++, parse

Tomas Telecky, writes:
> I am trying to write some kind of C++ syntactical analyzer using the
> Roskind's C++ bison grammar. This grammar contains two different
> tokens for identifiers - IDENTIFIER and TYPEDEFname. The lexer has to
> decide which one of them to return when it scans a common
> identifier. This decision is made accordingly to the contents of the
> symbol table.
> There are, however, forward references in C++ classes definitions
> possible, like
> class X { X(){ i = 0; } int i; };
> The "i" identifier is used before it is declared in class X (at least
> lexically). The lexer must return correct value for "i" though.
> [All the approaches I know of use either guessing, backing up, or both.
> You'll be happier with a more powerful tool than bison -- visit
> comp.compilers.pccts. -John]

Your analysis is correct, but it gets worse than that with the rules
that the ANSI committee passed. The initializers for the default
arguments also need to be parsed after the entire class definition is
read. The result is a three pass compiler (through each class), which
can be emulated by yacc. The rule is that you have to parse this
"stuff" "as-if" it appeared after the class definition.

1) When you hit stuff that can't be evaluated until the end of the
class (i.e., method bodies; default arguments to methods), sock it
away as a stream of tokens *without* parsing it *at all*.

2) When you reach the end of a class definition, cause the stored away
stuff to be regurgitated into the lexer-parser stream with proper
restatement of the method prototypes.

3) If you want to be complete... re-parse everything now with the full
context of the class, and be sure that under "reconsideration" nothing
has changed. This will detect such official errors as:

          typedef int MY_INT;
          class evil {
                    MY_INT an_int_member;
                    typedef long MY_INT; // will cause "reconsideration" discrepancy

...oh well... ANSI succeeded at making this stuff both hard to read,
and hard to parse.

Sorry to bear the painful news.

I've been too busy to try to recraft the parser to effectively support
steps 1 and 2 above. They are really not that hard, but it does
create a pretty messy system.

I suggest to programmers that they *not* use the above feature
involving forward references. Not only is it hard on the compiler, it
is hard on the reader. IMHO inline method declarations should be
pulled out of the class elaboration so that they can be automagically
pulled into an alternate source file and un-inlined. ...again... just
my opinion.

When you've gotta' parse it all... it is indeed very hard... just as
hard as it is to read when this weird stuff appears.


p.s., It is actually a *little* harder than indicated above... as the
class *might* end at a point which is *not* at file scope. This
happens with local classes, as well as with nested classes. Your
parser, which accepts such regurgitation, must be willing to accept
such arrivals of method definitions. Fortunately, the regurgitation
can be flagged by special tokens which can't be induced by characters
in the input stream... so the language is not really broken by the
above re-scanning.
-- <>
Jim Roskind fax: 415.528.4159 voice: 415.Java.Jim or 415.528.2546
PGP 2.6.2 Key fingerprint = 0E 2A B2 35 01 9B 5C 58 2D 52 05 9A 3D 9B 84 DB


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.