Re: Hand-written parsers?

"Mike Dimmick" <mike@dimmick.demon.co.uk>
24 Dec 2000 16:00:46 -0500

          From comp.compilers

Related articles
Hand-written parsers? thomas.luzat@gmx.net (Thomas Luzat) (2000-12-23)
Re: Hand-written parsers? mike@dimmick.demon.co.uk (Mike Dimmick) (2000-12-24)
Re: Hand-written parsers? smoleski@surakware.com (Sebastian Moleski) (2000-12-24)
Re: Hand-written parsers? jparis11@home.com (Jean Pariseau) (2000-12-24)
Re: Hand-written parsers? LLkParsing@aol.com (2000-12-31)
| List of all articles for this month |
From: "Mike Dimmick" <mike@dimmick.demon.co.uk>
Newsgroups: comp.compilers
Date: 24 Dec 2000 16:00:46 -0500
Organization: Compilers Central
References: 00-12-102
Keywords: parse, practice, comment
Posted-Date: 24 Dec 2000 16:00:46 EST



"Thomas Luzat" <thomas.luzat@gmx.net> wrote in message
> I'm wondering a bit what most commercial (mainly C++) compilers use:
> Hand-written parsers or parsers generated by compilers such as yacc?
>
> Is there anything bad/good in writing a hand-crafted parser? Would
> recursive-decent be okay for a language such as C++?


I quote from Stroustrup, "The Design and Evolution of C++", section
3.3.2, "Parsing C++":


"In 1982 when I first planned Cfront [AT&T's first C++ compiler], I
wanted to use a recursive descent parser because I had experience
[with them], because I liked such parsers' ability to produce good
error messages, and because I liked having the full power of a
general-purpose programming language available when decisions had to
be made in the parser. [...] Al Aho and Steve Johnson [...] convinced
me that writing a parser by hand was most old-fashioned, would be an
in-efficient use of my time, [...] and would be prone to unsystematic
and [...] unreliable error recovery. The right way was to use an
LALR(1) parser generator, so I used Al and Steve's YACC.


"For most projects, it would have been the right choice. For almost
every project writing an experimental language from scratch, it would
have been the right choice. [...] [F]or me and C++, it was a bad
mistake. [...] My bias toward top-down parsing has shown itself many
times over the years in the form of constructs that are hard to fit
into a YACC grammar. To this day, Cfront has a YACC parser
supplemented by much lexical trickery relying on recursive descent
techniques. On the other hand, it _is_ possible to write an efficient
and reasonably nice recursive descent parser for C++. Several modern
compilers use recursive descent."


Now, for what I'm doing. My final year undergraduate project is to
produce a program which generates UML class hierarchy diagrams from
C++ source. Therefore, I need a C++ parser. I've done some
investigation into tools and decided against YACC because a) it was
complicated and b) I couldn't work out where to place my actions. The
language seems to be better suited (as Stroustrup says) to LL(k)
parsing rather than LR.


After examining PRECCX and Philips' "Elegant" compiler-compiler
system, I rejected both of those because although they support porting
to Windows, they don't support creating native projects. I've settled
on PCCTS, which uses predicates in the parse. You can find out more
about writing parsers with the C++ version of PCCTS (ANTLR version 2.x
uses Java - http://www.antlr.org/) at http://www.polhode.com/pccts/.
There is a C++ parser available written by NeXT, and one written by
John Lilley, but the first covers C++ as of 1995, and the second as of
1997. The first also relies on 2 symbols of lookahead in places, and
the second utilises a somewhat hacked version of the parser generator
- I've been trying to sort it out for the stock version, with some
difficulty. PCCTS/ANTLR generates recursive-descent parsers.


HTH,
--
Mike Dimmick
Final Year Undergraduate, Aston University, UK.
[A counterargument says that if Strostrup had paid attention to the
error messages from yacc, maybe the syntax C++ wouldn't be such a mess
and at least wouldn't be ambiguous. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.