Re: Parsing C++ headers?

mw@ipx2.rz.uni-mannheim.de (Marc Wachowitz)
16 Apr 1996 22:25:33 -0400

From comp.compilers

Related articles
Parsing C++ headers? 104316.1514@CompuServe.COM (John Mitchell) (1996-04-06)
Re: Parsing C++ headers? johnm@mitchell.org (1996-04-08)
Re: Parsing C++ headers? cag@Cs.Nott.AC.UK (Cleveland A Gibbon) (1996-04-11)
Re: Parsing C++ headers? johnm@mitchell.org (1996-04-14)
*Re: Parsing C++ headers? mw@ipx2.rz.uni-mannheim.de* (1996-04-16)**
Re: Parsing C++ headers? Graham.Parrington@ncl.ac.uk (Graham D Parrington) (1996-04-18)
Re: Parsing C++ headers? feb6399@osfmail.isc.rit.edu (1996-04-18)

| List of all articles for this month |

From:	mw@ipx2.rz.uni-mannheim.de (Marc Wachowitz)
Newsgroups:	comp.compilers,comp.lang.c++
Date:	16 Apr 1996 22:25:33 -0400
Organization:	Compilers Central
References:	96-04-033
Keywords:	C++, yacc, parse

John Mitchell (104316.1514@CompuServe.COM) wrote:
> I want to parse C++ header files, and extract information about the
> contained classes and their interface. A yacc-able grammar for this
> type of problem ( or superset of it, as for compilation) would be a
> great help.

You might also look at ftp://ftp.csl.sri.com/pub/dodd/btyacc.tar.gz -
the README file is appended below. Using a relatively "understandable"
(as far as one can say so for C++ ;-) grammar - such as from the ARM,
or the latest working draft - and paying a limited performance penalty
may be preferrable to a possibly somewhat faster "patchwork"-parser
recognizing constructs as early as possible.

                          -- Marc Wachowitz <mw@ipx2.rz.uni-mannheim.de>

BTYACC -- backtracking yacc.

BTYACC was created by Chris Dodd using ideas from many places and lots
of code from the Berkeley Yacc distribution, which is a public domain
yacc clone put together by the good folks at Berkeley. This code is
distributed with NO WARRANTEE and is public domain. It is certain to
contain bugs, which you should report to:
dodd@csl.sri.com

See the README.BYACC, NEW_FEATURES, and ACKNOWLEDGEMENTS files for more
about Berkeley Yacc and other sources of info.

BTYACC is a modified version of yacc that supports automatic
backtracking and semantic disambiguation to parse ambiguous grammars,
as well as syntactic sugar for inherited attributes (which tend to
introduce conflicts). Whenever a btyacc generated parser runs into a
shift-reduce or reduce-reduce error in the parse table, it remembers
the current parse point (yacc stack and input stream state), and goes
into trial parse mode. It then continues parsing, ignoring most rule
actions. If it runs into an error (either through the parse table or
through an action calling YYERROR), it backtracks to the most recent
conflict point and tries a different alternative. If it finds a
successful parse (reaches the end of the input or an action calls
YYVALID), it backtracks to the point where it first entered trial
parse mode, and continues with a full parse (executing all actions),
following the path of the successful trial.

Actions in btyacc come in two flavors -- {}-actions, which are only
executed when not in trial mode, and []-actions which are executed
regardless of mode. There are also inherited attributes, which look
like arguments (they're enclosed in `()') and act like []-actions.

What this buys you:

* No more lexer feedback hack. In yacc grammars for C, a standard
hack, know as the `lexer feedback hack' is used to find typedef names.
The lexer uses semantic information to decide if any given identifier
is a typedef-name or not and returns a special token. With btyacc,
you no longer need to do this; the lexer should just always return an
identifier. The btyacc grammar then needs a rule of the form:

typename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ]

While the hack works adequately well for parsing C, it becomes a
nightmare when you try to parse something like C++, where treating
an ID as a typedef becomes heavily dependent on context.

* Easy disambiguation via simple ordering. Btyacc runs its trials via
the rule ``try shifting first, then try reducing by the order that the
conflicting rules appear in the input file''. This means you can deal
with semantic a disambiguation rule like:
        [1] If it looks like a declaration it is, otherwise
        [2] If it looks like an expression it is, otherwise
        [3] it is a syntax error
[Ellis & Stroustrup, Annotated C++ Reference Manual, p93]

To deal with this, you need only put all the rules for declarations
before the rules for expressions in the grammar file.

* No extra cost if don't use it. Backtracking is only triggered when
the parse hits a shift/reduce or reduce/reduce conflict in the table.
If you have no conflicts in your grammar, there's no extra cost, other
than some extra code which will never be invoked.

* C++ and ANSI C compatible parsers. The parsers produced by btyacc
can be compiled with C++ correctly. If you `#define' YYSTYPE to be
some C++ type with constructor and destructor, everything will work
fine. My favorite is `#define YYSTYPE SmartPointer', where
SmartPointer is a smart pointer type that does garbage collection on
the pointed to objects.

BTYACC was originally written to make it easy to write a C++ parser
(my goal was to be able to use the grammar out of the back of the ARM
with as few modifications as possible). Anyone who has ever looked at
Jim Roskind's public domain C++ yacc grammar, or the yacc-based
grammar used in g++ knows how difficult this is. BTYACC is very
useful for parsing any ambiguous grammar, particularly ones that come
from trying to merge two (or more) complete grammars.

Inherited attributes in btyacc:

Inherited attributes look a lot like function arguments to non-terminals,
which is what they end up being in a recursive descent parser, but NOT
how they're implemented in btyacc. Basically they're just syntactic
sugar for embedded semantic actions and $0, $-1, ... in normal yacc.
btyacc gives you two big advantages besides just the syntax:
        1. it does type checking on the inherited attributes, so you don't
              have to specify $<type>0 and makes sure you give the correct`
              number of arguments (inherited attributes) to every use of a
              non-terminal.
        2. It `collapses' identical actions from that are produced from
              inherited attributes. This eliminates many potential reduce-reduce
              conflicts arrising from the inherited attributes.

You use inherited attributes by declaring the types of the attributes
in the preamble with a type declaration and declaring names of the
attributes on the lhs of the yacc rule. You can of course have more
than one rule with the same rhs, and you can even give them different
names in each, but the type and number must be the same.

Here's a small example:
%type <t1> lhs(<t1>, <t2>) /* lhs takes two inherited attributes */
stuff(<t1>, <t2>)
%%
lhs($i1, $i2) : { $$ = $i1 }
| lhs($i1, $i2) stuff($1,$i2) { $$ = $2; }

This is roughly equivalent to the following yacc code:
lhs : { $$ = $<t1>-1; }
        | lhs [ $<t1>$ = $1; ] [ $<t2>$ = $<t2>0; ] stuff { $$ = $4; }
        ;

See the file `test/t2.y' for a longer and more complete example.
At the current time, the start symbol cannot have any arguments.
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Parsing C++ headers?

mw@ipx2.rz.uni-mannheim.de (Marc Wachowitz)16 Apr 1996 22:25:33 -0400

mw@ipx2.rz.uni-mannheim.de (Marc Wachowitz)
16 Apr 1996 22:25:33 -0400