|Fuzzy parsing info sought firstname.lastname@example.org (1994-08-10)|
|Re: Fuzzy parsing info sought kendall@pot.East.Sun.COM (1994-08-15)|
|Re: Fuzzy parsing info sought email@example.com (Terence Parr) (1994-08-19)|
|From:||Terence Parr <firstname.lastname@example.org>|
|Date:||Fri, 19 Aug 1994 16:53:55 GMT|
Much can be done using "fuzzy" parsing to collect definitions in C++.
However, I have taken a different approach (I have just rapidly built such a
creature for a client) because I needed fairly accurate reporting.
For browsing, which I assume you want to do, you might be able to ignore all
the stuff inside of function definitions (i.e., the between the curlies),
being careful not to ignore the curlies of class defs. This eliminates much
of the C++ grammar right there. Then, you can simply list the possible
variable, class, function, and type definitions in some grammar rule.
Naturally, this ``list of possibilities rule'' will not be LL(k) for any
finite k; therefore, I use the predicated-LL(k) strategy of PCCTS to do
global_decl : (this is tough to parse)?
| (oh boy, is this one really tough)?
| (you'll never get this one without arbitrary lookahead)?
| last choice is obvious--no predicate needed
This alternative list informs ANTLR (the parser generator of PCCTS)
that it will never figure out what to do with a finite amount of
lookahead from the left edge of global_decl; hence, it should simply
figure it out at parse time with selective backtracking.
The rest of the grammar is pretty much LL(k) so things proceed
in a normal, deterministic manner.
BTW, I had to assume NON-preprocessed input--as long as the preprocessor
symbols did not change the C++ structure (like using BEGIN instead of left
curly), things were ok. This saved a lot of parse time.
Just thought I'd mention a different approach.
Return to the
Search the comp.compilers archives again.