Re: Fuzzy parsing info sought

Terence Parr <parrt@everest.ee.umn.edu>
Fri, 19 Aug 1994 16:53:55 GMT

          From comp.compilers

Related articles
Fuzzy parsing info sought pollice@centerline.com (1994-08-10)
Re: Fuzzy parsing info sought kendall@pot.East.Sun.COM (1994-08-15)
Re: Fuzzy parsing info sought parrt@everest.ee.umn.edu (Terence Parr) (1994-08-19)
| List of all articles for this month |

Newsgroups: comp.compilers
From: Terence Parr <parrt@everest.ee.umn.edu>
Keywords: parse
Organization: Compilers Central
References: 94-08-080
Date: Fri, 19 Aug 1994 16:53:55 GMT

Gary,


Much can be done using "fuzzy" parsing to collect definitions in C++.
However, I have taken a different approach (I have just rapidly built such a
creature for a client) because I needed fairly accurate reporting.


For browsing, which I assume you want to do, you might be able to ignore all
the stuff inside of function definitions (i.e., the between the curlies),
being careful not to ignore the curlies of class defs. This eliminates much
of the C++ grammar right there. Then, you can simply list the possible
variable, class, function, and type definitions in some grammar rule.
Naturally, this ``list of possibilities rule'' will not be LL(k) for any
finite k; therefore, I use the predicated-LL(k) strategy of PCCTS to do
something like:


global_decl : (this is tough to parse)?
| (oh boy, is this one really tough)?
| (you'll never get this one without arbitrary lookahead)?
| last choice is obvious--no predicate needed
;


This alternative list informs ANTLR (the parser generator of PCCTS)
that it will never figure out what to do with a finite amount of
lookahead from the left edge of global_decl; hence, it should simply
figure it out at parse time with selective backtracking.


The rest of the grammar is pretty much LL(k) so things proceed
in a normal, deterministic manner.


BTW, I had to assume NON-preprocessed input--as long as the preprocessor
symbols did not change the C++ structure (like using BEGIN instead of left
curly), things were ok. This saved a lot of parse time.


Just thought I'd mention a different approach.


Regards,
Terence Parr
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.