12 Jun 2005 21:30:26 -0400

Related articles |
---|

Opinions about "epsilon" Symbols in Parse Trees drikosv@otenet.gr (Evangelos Drikos) (2005-06-09) |

Re: Opinions about "epsilon" Symbols in Parse Trees schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-06-10) |

Re: Opinions about "epsilon" Symbols in Parse Trees DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-06-10) |

Re: Opinions about "epsilon" Symbols in Parse Trees mefrill@yandex.ru (mefrill) (2005-06-12) |

Re: Opinions about "epsilon" Symbols in Parse Trees news4e71@yahoo.com (0x4e71) (2005-06-12) |

Re: Opinions about "epsilon" Symbols in Parse Trees DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-06-12) |

Re: Opinions about "epsilon" Symbols in Parse Trees drikosv@otenet.gr (eDrikos) (2005-06-13) |

From: | "mefrill" <mefrill@yandex.ru> |

Newsgroups: | comp.compilers |

Date: | 12 Jun 2005 21:30:26 -0400 |

Organization: | http://groups.google.com |

References: | 05-06-052 |

Keywords: | parse, LR(1) |

Posted-Date: | 12 Jun 2005 21:30:26 EDT |

The epsilon problem in LR(k) analysis is really wider problem of

ambiguity. Now I'll tey explain this. Firstly, you should remember LR

algorithm does not work with rules but works with states of

LR-automata. State contain signed rules - rules within point inside

right part of the rule. For example, rule id --> ids idp can produce

three signed rules: id --> * ids idp, id --> ids * idp, id --> ids idp

*. LR(k) generator firstly (befroe any parsing) builds LR(k)-automata.

The state of LR(k)-automata is LR(k)-item, which consists of two

components: signed rule and lookahead string. Lookahead string is just

the sequence of k terminals of the grammar. For example, LR(1)-item's

lookahead string contains only the single terminal symbol. We'll

discuss LR(0)-items with empty lookahead string to simplify our

discussion. Take you grammar and build LR(0)-automata for this as an

example and illistration of the problem. Lets start from grammar

description:

id --> ids idp

ids --> LL

idp --> LL

idp --> D

idp -->

And build LR(0)-automata for this:

State 0 is initial state. So, we put start rule there:

id --> * ids idp.

So far, the state contains only the single state and nonterminal symbol

ids is just after the point in signed rule. So, apply closure operation

i.e. add to the state all signed rules of the grammar where ids is on

the left part. It is only the single such the rule ids --> LL. So, the

state number 0 is now follows:

id --> * ids idp

ids --> * LL

Now, build moves from this state. Move means really point move one

symbol to right. There are two moves: on symbols ids and LL. So, there

may be two additinal states:

State 1 contains

ids --> LL *

Here closure operation cannot be applied as there is not any symbol

after the point. And state 2

id --> ids * idp

It is time to apply closure to the symbol idp. We get

id --> ids * idp

idp --> * LL

idp --> *D

idp --> *

Build three additional state on moves from state 2.

State 3 on LL move

idp --> LL *

State 4 on D move

idp --> D *

And state 5 on idp move

id --> ids idp *

For the moment, LR(0)-automata is build (there are not any move). But

the information we can get from the automata is not enough to do

LR-analysis. The problem is in move on nonterminal. We know how to do

moving on terminal, when we read it from input string. So, starting

from initial state 0 we can easy move in state 1 by reading symbol LL

from input string. But, how to move on ids??? The solution is taking a

look on state with LR-item where point is on the last position in right

part of the rule. In our automata we have all such the states except

state 0. What does it mean the point on the last position in the signe

rule? It means that some rule (and some nonterminal!) is parsed. So, in

state 1 we have parsed rule ids --> LL and it means we have parsed

symbol ids. So, we can move from state 0 to state 2 on symbol ids. Why

from state 0? Beacause it contains rule ids --> * LL where point is on

the start position and via the sequence of moves (really only one move)

we can get state 1. It is named reduce in LR-analysis. The move on some

symbol is named as shift. In each state we must know what to do,

wherther reduce or shift. If we have more then one possibility it is

named as ambiguity. We have ambuity in state 2. We can shift on LL,

shift on D and reduce on idp in the same state 2 and after go to state

5. And, if the problem with mmoves on LL and D can be avoided by using

LR(1)-automata instead of LR(0), the empty rule idp --> * cannot be

avoided for any k in LR(k). We will have shift/reduce conflict. How to

decide this? The approach was intended in 1974 by Bernard Lang and is

based really on very simple idea. Every time when we have ambiguity

comflict clone the parse stack and parse on them in parallel. So, we

may have possiblly many many stacks with many symbols and state numbers

in each of them and every time when we read symbol from the input

string we do parsing on each stack. Of course, it may be not so soon,

but it is the payment for ambiguity. In 80th Masari Tomita suggested do

not clone stacks of symbols, only clone references on them. So, we have

references and it means we have graph where vertexes are either state

numbers or grammar symbols and references between them are edges.

Tomita named such the graph as Graph Structured Stack - GSS and the

Tomita's algorithm worked on this. Tomita intended five algorithms amd,

it's interesting, not one of them worked correctly on any grammar. The

problem was still with empty rule. In firs part of 90th Tomita's

disciple Farshi (I am not sure about right name) intended new

algorithm, which worked correctly on any grammar. But, the

computational complexity of the algorithm was very high. And in 2000th

three guys from London, Scott, Johnstone and Hussain, decided the

problem with empty rule by modification LR-automata (adding additional

moves on empty rules). So, you really need to this algorithm. Below I

put some publications list where you can find needed information.

1. B. Lang. Deterministic techniques for efficient non-deterministic

parsers. Automata, Languages and Programming: 2nd Colloquium, pages

255-269, 1974.

2. M. Tomita. An Efficient Augmented-Context-Free Parsing Algorithm.

Pittsburgh, Carnegie-Mellon University, 1987.

3. M. Tomita. Efficient parsing for natural language. Kluwer Academic

Publishers, Boston, 1986.

4. R. Nozohoor-Farshi. GLR parsing for -grammars. In Masaru Tomita,

editor, Generalized LR parsing, pages 61-75. Kluwer Academic

Publishers, Netherlands, 1991,

5. E. Scott, A. Johnstone, S. S. Hussain. Tomita-Style Generalized LR

Parsers. Royal Holloway University of London, 2000.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.