Re: what parser generator?

Hans-Bernhard Broeker <>
18 Dec 2000 12:17:51 -0500

          From comp.compilers

Related articles
what parser generator? (Paul Drummond) (2000-12-18)
Re: what parser generator? (Hans-Bernhard Broeker) (2000-12-18)
Re: what parser generator? (Ira D. Baxter) (2000-12-19)
Re: what parser generator? (Mike Dimmick) (2000-12-19)
Re: what parser generator? (Paul Drummond) (2000-12-20)
Re: what parser generator? (Ira D. Baxter) (2000-12-21)
Re: what parser generator? (2001-01-09)
| List of all articles for this month |

From: Hans-Bernhard Broeker <>
Newsgroups: comp.compilers
Date: 18 Dec 2000 12:17:51 -0500
Organization: Aachen University of Technology (RWTH)
References: 00-12-079
Keywords: C++, parse, comment
Posted-Date: 18 Dec 2000 12:17:50 EST

Paul Drummond <> wrote:
> I am writing a C++ DocTool for my 3yr uni project and I have been looking
> at different generators.

> COCO/R was the first choice because we are learning it at uni, but my
> lecturer says it would be very difficult to extract comments using this.
> Does anyone dissagree with this?

Not ever even having heard of this COCO/R tool, I cannot disagree. But
from my own experience with a C-analysing program, I can say that it's
probably your best bet to explicitly reflect the preprocessing phase
of C-like languages in your analyser. I.e. first preprocess to isolate
the comments from the rest of the text (keeping pointers into the
preprocessed text as links between them), and then lex/parse the
remaining text to understand the structure.

Writing a somewhat complete grammar for
'C++-with-all-comments-still-in' is quite a bit more tedious than one
for 'C++-after-preprocessing'. You'll roughly double the number of
rules in the grammar unless you have a parser generator that
understands 'optional' symbols in the grammar. Even then, the tons of
'comment_optional' terms would make the grammar quite unreadable.

> The alternative is to write my own parser. I don't think it would be
> IMPOSSIBLE because I never enter function bodies, so i don't need to look
> for expressions, loops or anything.

You do have to, at least partly. It's the only reliable way of finding
the _end_ of a function. You have to at least count braces. Not even
to mention the occacional #ifdef section, and whatever other
complication C++ has added which I don't even know of (I went from C
straight to Java)

Been there, done that: 'cscope' tries to understand enough of C to
generate a full cross reference, while only using 'lex'. I.e. no
'yacc'. It ends up simulating grammar rules by rather complex regular
expressions, and it fails for certain special cases (anything with
parentheses inside a function parameter list).

Hans-Bernhard Broeker (
[Rather than putting the comments in the grammar, I'd fake it in the
lexer and hang the comment text on the preceding or following token.
That's not perfect, but it's not much less perfect than a lot of
more complicated schemes. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.