Related articles |
---|
Parser for C++ implemented in Java davidp@imec.be (2003-01-07) |
Re: Parser for C++ implemented in Java theo@engr.mun.ca (Theodore Norvell) (2003-01-12) |
From: | Theodore Norvell <theo@engr.mun.ca> |
Newsgroups: | comp.compilers,comp.compilers.tools.javacc |
Date: | 12 Jan 2003 17:39:11 -0500 |
Organization: | Memorial University of Newfoundland |
References: | 03-01-037 |
Keywords: | C++, parse, Java |
Posted-Date: | 12 Jan 2003 17:39:11 EST |
Patrick wrote:
> Does anybody has experience on parsing large C++ source code with
> javacc or another java based parser ? After reading the previous
> posts, it seems to be quite tricky, any information would be greatly
> appreciated.
>
> Patrick
I've been working on a parser for a C++ subset, using JavaCC. Yes it is tricky.
Very tricky. I plan to put the parser in the public domain once I'm happy
with it, but that isn't quite yet. Here are some issues that make
it hard:
Interaction with the symbol table. How you treat identifiers depends
on whether they are declared as types or not. In some cases this
requires peeking ahead in the token stream in order to make the
decision. Consider
a::b::c::d
whether you treat this as an type name depends on the declaration of d.
JavaCC's semantic look-ahead and the ability to peek ahead in the token
stream make this possible.
Distinguishing declarations from function definitions. At first I tried
doing this by looking ahead for a comma or semicolon. This turned
out not to work when a class specification appears as a decl_specifier,
so I ended up combining the nonterminal for function definitions
with that for simple declarations.
Declaration before use. Mostly C++ has declaration before use. But
it doesn't (entirely) within classes. Consider
class A {
int foo() { T (i) ; i = 0 ; return i ; }
typedef int T ;
} ;
Is T(i) a function call or a variable declaration?
My solution (not implemented yet) is to delay parsing of the function bodies
until the end of the class specification. JavaCC should make this fairly
easy (using a custom token manager), but I haven't done it yet.
Templates. I'm not implementing templates. But if I were, I'd want to delay
parsing until the template is instantiated. Again JavaCC should make this
possible. To make this work you also need to design the symbol table so it
can be backed up to the right place to provide the right context for the parse.
Telling when the decl-specifiers stop.
A simple declaration in C++ is of the form
(decl_specifier)* (init_declarator)* ";"
In some cases it can be hard to tell when to jump out of the
first loop. This sounds easy, but there are some subtle issues.
JavaCC's flexibility is very useful in dealing with some of these issues.
But it is clear that C++ evolved in an environment where parsing was mostly
bottom-up. So I've sometimes thought that the job might be easier with
a bottom-up parser generator such as CUP. Maybe this is just the grass
seeming greener on the other side of the hill.
If you don't need all the accuracy that you need for a compiler (say for computing
metrics or something) then there are a lot of shortcuts you can take. To do an
accurate parse for full ISO C++, it is a considerable investment of effort.
Cheers,
Theodore Norvell
Return to the
comp.compilers page.
Search the
comp.compilers archives again.