Re: Kannapinn in a Nutshell

Chris F Clark <cfc@world.std.com>
6 May 2003 01:04:08 -0400

From comp.compilers

Related articles
Kannapinn in a Nutshell joachim_d@gmx.de (Joachim Durchholz) (2003-04-20)
Re: Kannapinn in a Nutshell cdc@maxnet.co.nz (Carl Cerecke) (2003-04-27)
Re: Kannapinn in a Nutshell joachim_d@gmx.de (Joachim Durchholz) (2003-04-27)
Re: Kannapinn in a Nutshell soenke.kannapinn@wincor-nixdorf.com (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2003-05-06)
*Re: Kannapinn in a Nutshell cfc@world.std.com (Chris F Clark)* (2003-05-06)**
Re: Kannapinn in a Nutshell cdc@maxnet.co.nz (Carl Cerecke) (2003-05-13)
Re: Kannapinn in a Nutshell soenke.kannapinn@wincor-nixdorf.com (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2003-05-14)
Re: Kannapinn in a Nutshell soenke.kannapinn@wincor-nixdorf.com (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2003-05-16)
Re: Kannapinn in a Nutshell joachim.durchholz@web.de (Joachim Durchholz) (2003-06-20)

| List of all articles for this month |

From:	Chris F Clark <cfc@world.std.com>
Newsgroups:	comp.compilers
Date:	6 May 2003 01:04:08 -0400
Organization:	The World Public Access UNIX, Brookline, MA
References:	03-04-075 03-04-081
Keywords:	parse
Posted-Date:	06 May 2003 01:04:07 EDT

Carl Cerecke wrote:
> Surely reductions needn't be anymore complicated than normal. If I
> understand it, the partial items (a delightfully obvious idea, but
> only in retrospect) used in CLR are only used for the construction of
> the parsing automata. The actual LR parsing algorithm is the same as
> for LR/SLR/LALR and doesn't need the items (partial or full) that are
> in each state. For a state that has a reduction, that reduction is
> known. No scanning backwards down their stack is necessary. In fact,
> I'm pretty sure that if the reduction wasn't known explicitly, and had
> to be worked out from the stack, then conflicts would arise where more
> than one reduction would match the information on the stack.

My German is not good enough to read the thesis either and so I'm
basing these comments on Joachim's wonderful summary.

BTW, reading that Joachim is looking for work and seeing the quality
of his summary, I suggested potentially bankrolling a translation
project. However, I could only afford to contribute on the order of
$1K (USD) to such a project and he estimated it at a couple of months
work--not just compensation at all. Still, if others were willing to
contribute, perhaps we could provide a large enough pool that it would
be feasible to attempt.

In any case, in the summary Joachim wrote:
> A standard LR parser uses "items" of the form (A -> a~ . b~, z), where
> a~ and b~ are strings of grammar symbols, A -> a~ b~ is a grammar rule
> and z is a lookahead.
> CLR parser items drop the A and a~ and consist just of (b~, z).

I presume that the "partial" items, which lack all left context
including the non-terminal that the reduction derives, get merged
without reference to the deriving non-terminals. As a result, one
cannot use a standard LR parsing automata to perform the reduction as
you don't know what symbol the reduction derives.

For example, here is a small grammar for consideration that should
illustrate the point:

A: B C a;
B: a a:
C: a a;

Here is what I would assume is the construction of the CLR automata.
The parts in brackets are not kept in the CLR representation.

state 0: // start state
      kernel:
[goal:] .A, eof
      closure:
[A: ] .B C a, a
                [B: ] .a a, a
      transitions:
                a -> state 1
      goto:
                A -> state 3
B -> state 4

state 1:
      kernel:
[B: a] .a, a
      transitions:
a -> state 2

state 2:
      kernel:
[B: a a] ., a
      transitions:
a -> reduce B

state 3: // this is the accept state
      kernel:
              [goal: A] ., eof
      transitions:
eof -> accept

state 4:
      kernel:
[A: B] .C a, eof
      closure:
[C: ] .a a, a
      transitions:
                a -> state 7 // see below
      goto:
C -> state 5

state 5:
      kernel:
[A: B C] .a, eof
      transitions:
a -> state 6

state 6:
      kernel:
[A: B C a] ., eof
      transitions:
eof -> reduce A

state 7: // before minimization (merging)
      kernel:
[C: a] .a, a

Note that since the left context is removed (in particular the symbol
which derives the rule), this state has identical CLR items to state
1. Thus, if I understand what is suggested correctly, this item will
be merged with state 1. As a result state 2 can either reudce a B or
a C, depending upon the context (i.e. what's on the stack). This
merging is what distinguishes CLR parsing from SLR/LALR parsing.

Joachim Durchholz also asked:
> IOW if you are in conflict trouble and the conflict persists for CLR, a
> stronger LR parser won't help: you'll have to rewrite the grammar, or
> use a stronger parsing technology (is there any between LR and GLR/Tomita?).

Yes, there are several [non-trivial] classes of parsers between LR and
GLR. The LR(regular) and non-canonical LR (allowing non-terminals as
lookaheads) are two that have been published.

> 4. CLR variants of SLR parsers are impractical.

I would disagree with the point that combining CLR and SLR parsing is
useless. I think there is a very useful CSLR parsing technology. The
reason why it looks like CLR and SLR don't combine is that the CLR(0)
machine (i.e. that machine where one throws away the lookahead or
right context also appears to have one reduce state which has to look
back on the stack to determine what to reduce to). However, there is
a(n unpublished) technique I call left-edge sensititive LR parsing
(which we use in Yacc++) that saves the deriving symbol at the left
end of a rule, and that that allows one to know exactly what to reduce
to when coming to the "unified" reduction state.

Actually, the thought is quite intriguing, one could apply the
traditional DFA state minimization algorithm to a left-edge sensititve
CSLR parser and produce what is agruably the "minimal" parsing
automata for that grammar.

It also seems possible to go the other way too and use state splitting
techniques (ala Spector or Pagel) to generate either the CLR or SLR
state machine or even the full LR state machine from the CSLR machine.

> 6. LR parsing for extended CFGs (ECFG)
. . .
> 6a. Most existing literature on ECFG parsing is worthless.

Kannepinn raises some good points here and I've discussed them with
him. However, I don't think the the literature is as useless as he
suggests. Now, the litrature would have one think that ELR parsing is
hard and requires lookback automata and other complicated mechanisms.
And, Kannepinn would have one believe that their "positive" results
are the result of handwaving and glossing over certain forms of
ambiguities.

However, neither is completely true in my opinion. One can parse ELR
grammars quite simply if one builds the right kind of automaton. One
has either the option of ignoring certain kinds of ambiguities at
parser generation time (and dealing with them at runtime) or by using
the techniques Kannepinn suggests removing them. In practice, the
deal with them at runtime approach is quite effective for most grammar
writers, as is evidenced by our Yacc++ customers. A better tool would
integrate Kannepinn's techinques for even more robust semantics.

Somedays I must admit, I wish I could do more parsing theory as part
of my job....
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Kannapinn in a Nutshell

Chris F Clark <cfc@world.std.com>6 May 2003 01:04:08 -0400

Chris F Clark <cfc@world.std.com>
6 May 2003 01:04:08 -0400