Re: LR (k) vs. LALR

Chris F Clark <cfc@shell01.TheWorld.com>
7 Sep 2004 23:55:11 -0400

From comp.compilers

Related articles
[4 earlier articles]
Re: LR (k) vs. LALR kamalp@acm.org (2004-08-15)
Re: LR (k) vs. LALR clint@0lsen.net (Clint Olsen) (2004-08-23)
Re: LR (k) vs. LALR jeremy.wright@microfocus.com (Jeremy Wright) (2004-08-25)
Re: LR (k) vs. LALR schmitz@i3s.unice.fr (Sylvain Schmitz) (2004-09-03)
Re: LR (k) vs. LALR kamalp@acm.org (2004-09-03)
Re: LR (k) vs. LALR gsc@zip.com.au (Sean Case) (2004-09-07)
*Re: LR (k) vs. LALR cfc@shell01.TheWorld.com (Chris F Clark)* (2004-09-07)**
RE: LR (k) vs. LALR quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-09-08)

| List of all articles for this month |

From:	Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups:	comp.compilers
Date:	7 Sep 2004 23:55:11 -0400
Organization:	The World Public Access UNIX, Brookline, MA
References:	04-08-037 04-08-055 04-08-073 04-08-098 04-08-111 04-08-145 04-09-015
Keywords:	parse, LALR
Posted-Date:	07 Sep 2004 23:55:11 EDT

> The author states that he wrote the GLR parser generator solely to
> handle C++ language spec [and someone lapped it up to handle Java].
>
> What exactly is it about OO languages that an LALR(1) parser cannot
> handle?

As the moderator noted, there is nothing about "OO" languages that
LALR(1) parsers cannot handle, but C++ itself is problematic. There
are LALR(1) and LL grammars for Java.

One of the problems with C++, is that expressions and declarations can
look exactly the same (technically, any language containing those the
or of the two productions is ambiguous) and C++ gets around that by
saying, if it looks like a declaration, it is a declaration (forcing
the "or" to be resolved in a particular declaration (and resolving the
ambiguity). However, that resolution is not expressed gramatically,
and one can not take two random context free rules and difference them
and expect the result to be a context free language, which is what the
C++ ambiguity resolution requires one to do.

In contrast, GLR grammars are not required to be unambiguous. Any
ambiguity is resolved by producing a resulting parse-forest that
represents all the potential mabiguous choices and requiring a later
"semantic" pass to choose which parse tree in the forst is the desired
one. Thus, with a GLR parser, one can disambiguate the C++ problem by
selecting the parse tree that treats all the ambiguous expression/
declaration sub-trees as declarations.

The only problem with GLR as a technology is that are no "warnings"
from the grammar processing tool that the language is ambiguous.
Well, there are warnings that the language is not LR (or LALR) or
whatever technology the GLR parser uses as a base. However, some of
those grammars will actually not be ambiguous and some of the will be
ambiguous. However, in any case, once your GLR generator has given a
warning, one either must prove that the language actually isn't
ambiguous or write your semantic phase assuming that the language is
ambiguous and disambiguate the resulting forest.

It is worth mentioning that there are other ways of handling ambiguous
grammars. In particular, one can use predicates to resolve
ambiguities. Predicates allow one to take the difference of two
productions in a controlled manner. In particular, it is possible to
write a syntactic rules that says, try to parse this as a declaration
and if it isn't parse it as an expression. The difference between the
predicated and the GLR solution is that predicated grammars are still
deterministic. There are no hidden ambiguities in a predicated
grammar. If your predicated parser generator gives you an error, you
still have an unresolved ambiguity and if it doesn't the resulting
parser will always construct a parse tree (and not a forest).

I would be remiss if I also did not point out backtracking parsers,
which are another solution to the problem. In fact, all the
implementations of predicated parsers that I know of, use some form of
backtracking in their implementation. General backtrakcing parsers
share the characteristic with GLR parsers that they can parse
ambiguous grammars. Backtracking parsers generally also produce a
parse tree (although in theory they could also produce a forest).
Backtracking parsers have their own deficits though. Many
backtracking parsers will loop forever on some ambiguous grammars.
(Predicated backtraking parsers do not generally have this problem,
although they do not make the same linear time guarantees that pure LL
and LR parsers do(see note)--of course, any parser generator that can
handle a significant class of ambiguous must be inherently non-linear
for some grammars, and GLR parsers have a cubic worst case, same as
Earley parsers.) In addition, most backtracking parsers resolve
ambiguities by selecting one parse tree out of the forest to return.
This is generally done by the order of the rules in the grammar (which
determines the order the rules are tried in in ambiguous cases). If
one looks closely, this is very similar to using predicates
"implicitly" in the grammar. The key difference being that the tool
inserts the predicates rather than the user and does so without
warning and usually without the run-time termination guarantees.

I would like to mention that it is possible to build a predicated
parser using GLR technology, although I don't know of anyone
attempting to do so right now. From thought-experiments I have done
considering whether to implement such a tool, it seems like there
would be some advantages to building such a tool.

Again, I do not want to imply that these are the only techniques for
dealing with ambiguity. For example, Ralph Boland is pursing some
generalization of LR technology that I gather will handle a wider
class of languages and I don't think his technique is any of the
above.

Note: Bryan Ford recently published a paper on a "predicated" parsing
technique that made extensive use of memoization and lazy evaluation
to achieve (if I recall correctly) a linear time guarantee. His
technique shares a characterisitic with general backtracking parsers
in that the order of rules determines what is matched and the the
entire tree is disambiguated that way. He uses an "ordered" or clause
to implement this.

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: LR (k) vs. LALR

Chris F Clark <cfc@shell01.TheWorld.com>7 Sep 2004 23:55:11 -0400

Chris F Clark <cfc@shell01.TheWorld.com>
7 Sep 2004 23:55:11 -0400