Re: Is LALR(1) or LL(k) parser better for C++

John Lilley <jlilley@empathy.com>
22 Jan 1997 23:04:49 -0500

          From comp.compilers

Related articles
Is LALR(1) or LL(k) parser better for C++ kikonen@cs.joensuu.fi (1997-01-22)
Re: Is LALR(1) or LL(k) parser better for C++ jlilley@empathy.com (John Lilley) (1997-01-22)
Re: Is LALR(1) or LL(k) parser better for C++ dlmoore@ix.netcom.com (David L Moore) (1997-01-25)
Re: Is LALR(1) or LL(k) parser better for C++ thetick@scruz.net (Scott Stanchfield) (1997-01-26)
Re: Is LALR(1) or LL(k) parser better for C++ mrs@kithrup.com (1997-01-26)
Re: Is LALR(1) or LL(k) parser better for C++ dlmoore@ix.netcom.com (David L Moore) (1997-01-29)
| List of all articles for this month |

From: John Lilley <jlilley@empathy.com>
Newsgroups: comp.compilers
Date: 22 Jan 1997 23:04:49 -0500
Organization: Nerds for Hire, Inc.
References: 97-01-163
Keywords: C++, parse

Kari Ikonen wrote:
> Is LALR(1) or LL(k) based parser better for parsing C++ code?
> Which one of these is easier to handle?


Personally, I am using an LL(1) grammar with backtracking and
predicates. Both of these questions are complex and don't have any
pat answers. It comes down to several factors:


1) How important is performance? I believe that an LL(1) C++ parser
will be forced to backtrack (or perform some other hack such as
deferred ambiguity resolution) more often than LR(1). Backtracking is
only mildly expensive when done well, but a small performance hit may
be significant in a production compiler. More importantly,
backtracking across productions that require actions (such as managing
the scope context) can be tricky. That being said, most commercial
C++ compilers (so I hear) are hand-crafted recursive-descent parsers,
which is more closely related to LL than LR, so go figure...


2) What paradigm are you more comfortable with? I find LL to be much
easier to grasp and debug than LALR, but that's because LALR makes my
brain hurt. Someone smarter than I may find LALR easier to work with,
because fewer ambiguities will be encountered. But an LALR(1)
parser-generator with no backtracking may prove to be unworkable. LL
parsers have a single method to implement each rule, and repetition
will generate while{} loops instead of recursion, so the generated LL
parser will resemble the grammar rather closely. LALR parsers usually
generate a giant state table for the PDA machine, which is difficult
to debug without special tools.


3) How good are the respective tools? I use PCCTS and find it to be
fabulously good (albeit somewhat quirky). PCCTS generates C++ classes
for the lexers and parsers, and makes it easy to attach multiple
parsers to a lexer, hook up streams of tokens as input to a parser,
run more than one parser at the same time, etc. This is important in
C++, because you reparse tokens in at least three cases (unless you
are very clever):


1) The expressions of #if/#elif in the preprocessor
2) The bodies of inline methods declared inside class definitions.
3) Template expansion.


There are some LALR tools that generate parsers as C++ classes (e.g.,
Visual Parse++). There are also some LALR tools that backtrack (e.g.,
btyacc). I'm not sure if any do both.


Obviously, this is not a simple question. To get an idea for the
complexity involved, get Jim Roskind's analysis of an LALR(1) C++
grammar from the comp.compiler archives or
http://www.emapthy.com/pccts/roskind.html. You can get a sample LL(2)
C++ grammar that I've been using from http://www.empathy.com/pccts.
Niether of these grammars is complete. The LALR(1) grammar does not
deal with templates at all. My LL(2) grammar is weak in a number of
ways (although I'm working on an improvement).


I think that in the end, the LL vs LR decision may pale in comparison
to the overall size and complexity of the task. My parser is now 30k
lines of code (and that is low because uses a lot of STL containers).
I suspect that size will double before the parser is complete, and
will double again when enough semantic analysis is added to perform
complete error reporting.


> [I suspect the correct answer to this question is "no". -John]


:) Right -- neither straight LL(k) nor LR(k) work for any finite k.
C++ is such a horrible language to parse.


john lilley
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.