6 May 2003 01:04:08 -0400

Related articles |
---|

Kannapinn in a Nutshell joachim_d@gmx.de (Joachim Durchholz) (2003-04-20) |

Re: Kannapinn in a Nutshell cdc@maxnet.co.nz (Carl Cerecke) (2003-04-27) |

Re: Kannapinn in a Nutshell joachim_d@gmx.de (Joachim Durchholz) (2003-04-27) |

Re: Kannapinn in a Nutshell soenke.kannapinn@wincor-nixdorf.com (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2003-05-06) |

Re: Kannapinn in a Nutshell cfc@world.std.com (Chris F Clark) (2003-05-06) |

Re: Kannapinn in a Nutshell cdc@maxnet.co.nz (Carl Cerecke) (2003-05-13) |

Re: Kannapinn in a Nutshell soenke.kannapinn@wincor-nixdorf.com (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2003-05-14) |

Re: Kannapinn in a Nutshell soenke.kannapinn@wincor-nixdorf.com (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2003-05-16) |

Re: Kannapinn in a Nutshell joachim.durchholz@web.de (Joachim Durchholz) (2003-06-20) |

From: | Chris F Clark <cfc@world.std.com> |

Newsgroups: | comp.compilers |

Date: | 6 May 2003 01:04:08 -0400 |

Organization: | The World Public Access UNIX, Brookline, MA |

References: | 03-04-075 03-04-081 |

Keywords: | parse |

Posted-Date: | 06 May 2003 01:04:07 EDT |

Carl Cerecke wrote:

*> Surely reductions needn't be anymore complicated than normal. If I*

*> understand it, the partial items (a delightfully obvious idea, but*

*> only in retrospect) used in CLR are only used for the construction of*

*> the parsing automata. The actual LR parsing algorithm is the same as*

*> for LR/SLR/LALR and doesn't need the items (partial or full) that are*

*> in each state. For a state that has a reduction, that reduction is*

*> known. No scanning backwards down their stack is necessary. In fact,*

*> I'm pretty sure that if the reduction wasn't known explicitly, and had*

*> to be worked out from the stack, then conflicts would arise where more*

*> than one reduction would match the information on the stack.*

My German is not good enough to read the thesis either and so I'm

basing these comments on Joachim's wonderful summary.

BTW, reading that Joachim is looking for work and seeing the quality

of his summary, I suggested potentially bankrolling a translation

project. However, I could only afford to contribute on the order of

$1K (USD) to such a project and he estimated it at a couple of months

work--not just compensation at all. Still, if others were willing to

contribute, perhaps we could provide a large enough pool that it would

be feasible to attempt.

In any case, in the summary Joachim wrote:

*> A standard LR parser uses "items" of the form (A -> a~ . b~, z), where*

*> a~ and b~ are strings of grammar symbols, A -> a~ b~ is a grammar rule*

*> and z is a lookahead.*

*> CLR parser items drop the A and a~ and consist just of (b~, z).*

I presume that the "partial" items, which lack all left context

including the non-terminal that the reduction derives, get merged

without reference to the deriving non-terminals. As a result, one

cannot use a standard LR parsing automata to perform the reduction as

you don't know what symbol the reduction derives.

For example, here is a small grammar for consideration that should

illustrate the point:

A: B C a;

B: a a:

C: a a;

Here is what I would assume is the construction of the CLR automata.

The parts in brackets are not kept in the CLR representation.

state 0: // start state

kernel:

[goal:] .A, eof

closure:

[A: ] .B C a, a

[B: ] .a a, a

transitions:

a -> state 1

goto:

A -> state 3

B -> state 4

state 1:

kernel:

[B: a] .a, a

transitions:

a -> state 2

state 2:

kernel:

[B: a a] ., a

transitions:

a -> reduce B

state 3: // this is the accept state

kernel:

[goal: A] ., eof

transitions:

eof -> accept

state 4:

kernel:

[A: B] .C a, eof

closure:

[C: ] .a a, a

transitions:

a -> state 7 // see below

goto:

C -> state 5

state 5:

kernel:

[A: B C] .a, eof

transitions:

a -> state 6

state 6:

kernel:

[A: B C a] ., eof

transitions:

eof -> reduce A

state 7: // before minimization (merging)

kernel:

[C: a] .a, a

Note that since the left context is removed (in particular the symbol

which derives the rule), this state has identical CLR items to state

1. Thus, if I understand what is suggested correctly, this item will

be merged with state 1. As a result state 2 can either reudce a B or

a C, depending upon the context (i.e. what's on the stack). This

merging is what distinguishes CLR parsing from SLR/LALR parsing.

Joachim Durchholz also asked:

*> IOW if you are in conflict trouble and the conflict persists for CLR, a*

*> stronger LR parser won't help: you'll have to rewrite the grammar, or*

*> use a stronger parsing technology (is there any between LR and GLR/Tomita?).*

Yes, there are several [non-trivial] classes of parsers between LR and

GLR. The LR(regular) and non-canonical LR (allowing non-terminals as

lookaheads) are two that have been published.

*> 4. CLR variants of SLR parsers are impractical.*

I would disagree with the point that combining CLR and SLR parsing is

useless. I think there is a very useful CSLR parsing technology. The

reason why it looks like CLR and SLR don't combine is that the CLR(0)

machine (i.e. that machine where one throws away the lookahead or

right context also appears to have one reduce state which has to look

back on the stack to determine what to reduce to). However, there is

a(n unpublished) technique I call left-edge sensititive LR parsing

(which we use in Yacc++) that saves the deriving symbol at the left

end of a rule, and that that allows one to know exactly what to reduce

to when coming to the "unified" reduction state.

Actually, the thought is quite intriguing, one could apply the

traditional DFA state minimization algorithm to a left-edge sensititve

CSLR parser and produce what is agruably the "minimal" parsing

automata for that grammar.

It also seems possible to go the other way too and use state splitting

techniques (ala Spector or Pagel) to generate either the CLR or SLR

state machine or even the full LR state machine from the CSLR machine.

*> 6. LR parsing for extended CFGs (ECFG)*

. . .

*> 6a. Most existing literature on ECFG parsing is worthless.*

Kannepinn raises some good points here and I've discussed them with

him. However, I don't think the the literature is as useless as he

suggests. Now, the litrature would have one think that ELR parsing is

hard and requires lookback automata and other complicated mechanisms.

And, Kannepinn would have one believe that their "positive" results

are the result of handwaving and glossing over certain forms of

ambiguities.

However, neither is completely true in my opinion. One can parse ELR

grammars quite simply if one builds the right kind of automaton. One

has either the option of ignoring certain kinds of ambiguities at

parser generation time (and dealing with them at runtime) or by using

the techniques Kannepinn suggests removing them. In practice, the

deal with them at runtime approach is quite effective for most grammar

writers, as is evidenced by our Yacc++ customers. A better tool would

integrate Kannepinn's techinques for even more robust semantics.

Somedays I must admit, I wish I could do more parsing theory as part

of my job....

-Chris

*****************************************************************************

Chris Clark Internet : compres@world.std.com

Compiler Resources, Inc. Web Site : http://world.std.com/~compres

3 Proctor Street voice : (508) 435-5016

Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.