Re: Maintaining scope while parsing C with a YACC grammar

"Ira Baxter" <idbaxter@semdesigns.com>
Fri, 13 May 2011 17:46:39 -0500

          From comp.compilers

Related articles
Maintaining scope while parsing C with a YACC grammar eliben@gmail.com (eliben) (2011-04-25)
Re: Maintaining scope while parsing C with a YACC grammar bobduff@shell01.TheWorld.com (Robert A Duff) (2011-04-26)
Re: Maintaining scope while parsing C with a YACC grammar bobduff@shell01.TheWorld.com (Robert A Duff) (2011-04-26)
Re: Maintaining scope while parsing C with a YACC grammar eliben@gmail.com (eliben) (2011-04-28)
Re: Maintaining scope while parsing C with a YACC grammar bobduff@shell01.TheWorld.com (Robert A Duff) (2011-05-02)
Re: Maintaining scope while parsing C with a YACC grammar torbenm@diku.dk (2011-05-03)
Re: Maintaining scope while parsing C with a YACC grammar paul@paulbmann.com (Paul B Mann) (2011-05-06)
Re: Maintaining scope while parsing C with a YACC grammar idbaxter@semdesigns.com (Ira Baxter) (2011-05-13)
Maintaining scope while parsing C with a Yacc grammar cfc@shell01.TheWorld.com (Chris F Clark) (2011-06-12)
| List of all articles for this month |
From: "Ira Baxter" <idbaxter@semdesigns.com>
Newsgroups: comp.compilers
Date: Fri, 13 May 2011 17:46:39 -0500
Organization: Compilers Central
References: 11-04-036 11-04-038 11-05-003 11-05-007
Keywords: C, parse, types
Posted-Date: 14 May 2011 17:34:10 EDT
X-RFC2646: Format=Flowed; Original

"Robert A Duff" <bobduff@shell01.TheWorld.com> wrote in message
> No, sorry. I'd be interested to hear if anybody built a C parser that
> didn't use any feedback into the parser from a symbol table to deal
> with the typedef issue.


Our DMS Software Reengineering Toolkit does exactly this.
(www.semanticdesigns.com/Products/DMS/DMSToolkit.html).


We use GLR parsing machinery, which means we can decouple parsing from
semantic hackery. The grammar for this is just as you'd expect, with
productions for expression statements and declarations. The GLR
parser builds an ambiguous tree with maximal sharing of the subtrees.
At the end of the parse, you have a parse DAG with ambiguity nodes.


Name resolution is handled by an attribute grammar evaluator with some
special effects. The AGE is pretty general and we use it for lots of
tree-structured analyses; this is pretty nice for nested scopes,
building control flow graphs, etc. For name and type resolution, the
attributes passed up and down the tree are, as you might expect,
reference to symbol scopes and/or type declarations. The special
effects of interest here are the ability to declare an aborted
attribute evaluation at any tree node; this will kill off any
dependent attribute computation and delete the offending tree node and
its otherwise-unreferenced children (remember, there may be sharing,
so a node may have more than one reference). What we do to handle the
T* x issue is check for consistency of the types; if T is a type, then
T*x as an expression makes no sense and that part of the AGE dies off.
If T is numeric type, then T* x makes no sense as a declartion, and
that part of the AGE dies off. Destroying the "bad" subtree removes
the incorrect parse from the tree, and thus the final tree has no
ambiguity nodes and a completed name/type resoultion with symbol
tables that is correct. The C parser is question has been used in
anger to carry out static analysis of software systems with usome
19,000 compilation units. We think it is quite robust.


This scheme works really nicely for C, C++, COBOL and other languages,
and it means we can build grammars without worrying about semantic
constraints. The grammars are thus nice and clean, and the AGEs are
pretty clean too because of the separation of concerns.


You can do this by mixing symbol lookup and parsing, but then it
generally gets a lot messier to build both.


--
Ira Baxter, CTO
www.semanticdesigns.com



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.