Re: parsing C and C++, was Compiler Compiler Compiler

"Mike Dimmick" <>
31 Mar 2001 02:49:19 -0500

          From comp.compilers

Related articles
Compiler Compiler Compiler (Daniel C. Wang) (2001-03-22)
Re: Compiler Compiler Compiler (Mike Dimmick) (2001-03-26)
Re: Compiler Compiler Compiler (Kevin Szabo) (2001-03-27)
Re: parsing C and C++, was Compiler Compiler Compiler (Martin von Loewis) (2001-03-31)
Re: parsing C and C++, was Compiler Compiler Compiler (Mike Dimmick) (2001-03-31)
Re: parsing C and C++, was Compiler Compiler Compiler (Mike Dimmick) (2001-03-31)
| List of all articles for this month |

From: "Mike Dimmick" <>
Newsgroups: comp.compilers
Date: 31 Mar 2001 02:49:19 -0500
Organization: Compilers Central
References: 01-03-095 01-03-122 01-03-133
Keywords: parse, C, C++
Posted-Date: 31 Mar 2001 02:49:19 EST

"Kevin Szabo" <> wrote in message
> Mike Dimmick <> wrote:
> |Many 'new' programming languages tend to be C or C++ derivatives.
> |There are a number of problems with the C and C++ syntaxes which can
> |only be solved by absorbing semantic information into the parser.
> |Unfortunately, this usually means that information must be entered
> |into semantic tables whilst processing a complete rule. YACC really
> |doesn't handle this at all well.
> I've never tried to parse C/C++. Could you give an example or two
> of the problems (or point me to a reference).

Well, this has been discussed here recently and also on (by myself, mostly!) I don't really fancy
duplicating that again... You can find some information in Ed
Willink's Ph.D. thesis, which can be found at - look in chapter
4 and appendix F. Willink made a good effort to parse C++ _without_
the use of semantic information, but it requires essentially a two
stage parse - once to get the information into an Abstract Syntax
Tree, then reparsing the tree with semantic information to sort out
places where the original source has been misparsed.

> The problems I have seen with some parsing strategies is having the
> lexer bind a symbol before it gets to the parser, that is trying
> to lookup in the symbol tables and resolving whether a token is a
> varible/type/unbound before it hits the parser.

Which is why semantic predicates are moderately helpful - the token
remains (for example) IDR, but at an appropriate point in the parse,
the generated parser tests the predicate to determine whether to enter
a given (sub) rule. PCCTS and ANTLR I know have these predicates; I
have recently been informed that Yacc++ has them too, but that tool
costs money...

Of course, the use of semantic information to direct the parse causes
a lot of problems, because the grammar cannot be genuinely said to be
context-free. This loses some of the advantages of using a parser
generator - giving a higher level of abstraction, as well as allowing
access to techniques that are extremely tedious to implement 'by
hand.' It also means that tools such as my project work cannot be
implemented merely by a syntax analysis; semantic analysis is also

Hope this helps,

Mike Dimmick

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.