Re: An "open" letter to Karsten Nyblad (and other compiler compiler implementors)

Chris F Clark <cfc@shell01.TheWorld.com>
24 Jun 2005 14:14:26 -0400

          From comp.compilers

Related articles
[2 earlier articles]
Re: An "open" letter to Karsten Nyblad (and other compiler compiler im wyrmwif@tsoft.org (SM Ryan) (2005-06-22)
Re: An "open" letter to Karsten Nyblad (and other compiler compiler im qrczak@knm.org.pl (Marcin 'Qrczak' Kowalczyk) (2005-06-23)
Re: An "open" letter to Karsten Nyblad (and other compiler compiler im cfc@shell01.TheWorld.com (Chris F Clark) (2005-06-23)
Re: An "open" letter to Karsten Nyblad (and other compiler compiler im wyrmwif@tsoft.org (SM Ryan) (2005-06-24)
Re: An "open" letter to Karsten Nyblad (and other compiler compiler im vtsikoza@yahoo.com (2005-06-24)
Re: An "open" letter to Karsten Nyblad (and other compiler compiler im wyrmwif@tsoft.org (SM Ryan) (2005-06-24)
Re: An "open" letter to Karsten Nyblad (and other compiler compiler im cfc@shell01.TheWorld.com (Chris F Clark) (2005-06-24)
Re: An "open" letter to Karsten Nyblad (and other compiler compiler im schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-06-26)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 24 Jun 2005 14:14:26 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 05-06-111 05-06-117
Keywords: LALR, LR(1), parse
Posted-Date: 24 Jun 2005 14:14:25 EDT

SM Ryan wrote:
> Peruse past posting about people struggling to make a grammar LALR(1).
> Or struggling with yacc's various "improvements" to overcome LALR
> problem. You can continue to patch a broken program, or start over and
> do it right.


Having spent several years helping struggling grammar writers get
their grammars right, I can say with a fair amount of confidence that
most problems in a grammar are not resolved by switching from LALR(1)
to LR(k). For this note, that Yacc++ has supported LR(1) for
customers' grammars for years and LR(k) internally for longer. Most
grammar problems are due to the desired grammar being ambiguous. It
is a rare grammar that gets fixed by going to LR(k). In fact, I can't
recall one problem that came in, where LR(k) was the "solution".


The most common source of problems (other than bugs in our tool) was
the attempt to move semantics into the grammar. A typical example is
distinguishing between SQL expressions that return empty rows versus
ones that don't. A lot of expressions don't make sense when applied
to empty rows, but checking for the expressions which can generate
them almost universally leads to a broken SQL grammar.


The problem is slightly different at the lexical level. One really
can use more characters of lookahead (and/or backtracking) to
implement longest match tokens right. However, the LR-ness (as
opposed to LALR-ness or even LL-ness) is not as much an issue. Once,
one allows even LL recursive token definitions, one covers the
interesting cases (nested comments and similar constructs). The
if-then-else problem doesn't occur on the intratoken level, as one
doesn't make sub-token trees.


Historical notes: Yacc++ is not a derivative work of Yacc or Bison
(although it does process "roughly" the same grammar format). It was
originally an LALR(k), k unbounded, parser generator. The first
versions were written in Pascal for the Atari ST and part of a tool we
called JADE, around about 1986. When we rewrote to C/C++ for the Sun
in about 1990, we added state splitting loosely following David
Spector's article and turned off the unbounded k-lookeahead feature
for external use, which made it LR(1). The C++ grammar we wrote for it
in 1991 convinced us that turning off unbounded k lookahead, which can
make Yacc++ loop, was a good idea. And this isn't just because Yacc++
blindly incrmements k, when encountering a conflict--it has a very
clever (at least in my opinion) algorithm that actually knows when
certain rules are making "progress" toward converging to closure (and
prints errors for ones that don't).


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.