|what parser generator? Drum.Sefex@btinternet.com (Paul Drummond) (2000-12-18)|
|Re: what parser generator? firstname.lastname@example.org (Hans-Bernhard Broeker) (2000-12-18)|
|Re: what parser generator? email@example.com (Ira D. Baxter) (2000-12-19)|
|Re: what parser generator? firstname.lastname@example.org (Mike Dimmick) (2000-12-19)|
|Re: what parser generator? Drum.Sefex@btinternet.com (Paul Drummond) (2000-12-20)|
|Re: what parser generator? email@example.com (Ira D. Baxter) (2000-12-21)|
|Re: what parser generator? firstname.lastname@example.org (2001-01-09)|
|From:||"Ira D. Baxter" <email@example.com>|
|Date:||21 Dec 2000 14:56:41 -0500|
|Organization:||Posted via Supernews, http://www.supernews.com|
|References:||00-12-079 00-12-083 00-12-086 00-12-098|
|Posted-Date:||21 Dec 2000 14:56:41 EST|
> Ira D. Baxter wrote:
> > We think it is better to parse the unpreprocessed
> > text directly.
"Paul Drummond" <Drum.Sefex@btinternet.com> wrote in message
> I have tried parsing unpreprocessed text and it is practically
> impossible, if you want your product to work on any piece of C++ code,
> then there are loads of things that can cause problems! I am a linux
> user these days but before that I used Windows MFC and I tested my
> (crap) parser on the MFC code. Guess where it crapped out! There is
> code like this:
> #if MACRO1
> while(blah, blah);
> How can you keep track of braces in a situation like this? Only
> solution in my oppinion is to preprocess.
Yep, badly overlapped nested constructs from prepreprocesser and the
actual program text sure make a mess. (Our opinion is that this style
is well, not exactly the best, but we have to live with what people
> Please explain how far you have come to parsing without preprocessing.
> Also, surely there is a ready-made preprocessor out there that gives
> the user the option of leaving comments in????
> [What you do about the preprocessor depends heavily on what your overall
> goals are. You can certainly preprocess and parse the output, but if
> you want to make a symbol cross-ref, that'll lose anything in #if that's
> not included as well as all of the preprocessor symbols. I'd do pattern
> matching and fake it. -John]
We hear the GCC preprocessor can retain comments. But our attitude is
"so what?". As the moderator points out, what you do depends on your
goals. We want to analyze and *modify* the original program. Once
the proprocessor directives are gone, any cool modifications you might
make to the remaining program text will be instantly rejected by the
owning programmers, because you lost the preprocessor directives. (I
suppose you could attempt to put them back somehow, but this seems
like an equally big pain). Somehow, parsing the unprepocessed text
seems like the *only* solution.
As I said before, we don't have a complete solution to parsing
unpreprocessed text. What we do now is to add a few extra rules to
our grammar to cover the ugly instances we run into, and then apply
transforms to the recognized ugly stuff to normalize away the poor
nesting. In the samples (million lines system with 1800 files) we
have seen, this trick only occurs a few dozen times, and generally in
the same way; it appears to be something that a particular programmer
took up as a style. With these changes and a few related tricks, we
are able to parse *and automatically modify* these large systems.
We have some other ideas about how to analyze the preprocessor
directives to avoid this trick, but they aren't tested well enough to
discuss here yet.
Ira D. Baxter, Ph.D.,CTO email: firstname.lastname@example.org
Semantic Designs, Inc. web: http://www.semdesigns.com
12636 Research Blvd. C-214 voice: (512) 250-1018 x140
Austin, TX 78759-2200 fax: (512) 250-1191
Return to the
Search the comp.compilers archives again.