25 Oct 91 10:11:28 GMT

Related articles |
---|

[9 earlier articles] |

Re: LR(n) parsers rockwell@socrates.umd.edu (Raul Deluth Miller-Rockwell) (1991-10-19) |

Re: LR(n) parsers schoebel@bs5.informatik.uni-stuttgart.de (Thomas Schoebel) (1991-10-19) |

Re: LR(n) parsers drw@riesz.mit.edu (1991-10-22) |

Re: LR(n) parsers schoebel@bs5.informatik.uni-stuttgart.de (Thomas Schoebel) (1991-10-24) |

Re: LR(n) parsers sankar@Neon.Stanford.EDU (Sriram Sankar) (1991-10-24) |

Re: LR(n) parsers ge@sci.kun.nl (1991-10-25) |

Re: LR(n) parsers schoebel@bs5.informatik.uni-stuttgart.de (Thomas Schoebel) (1991-10-25) |

Re: LR(n) parsers markh@csd4.csd.uwm.edu (1991-11-06) |

Newsgroups: | comp.compilers,comp.theory |

From: | Thomas Schoebel <schoebel@bs5.informatik.uni-stuttgart.de> |

Keywords: | parse, theory, LR(1) |

Organization: | IfI, Univ. Stuttgart, W Germany |

References: | 91-10-036 91-10-093 |

Date: | 25 Oct 91 10:11:28 GMT |

In article 91-10-093 schoebel@bs5.informatik.uni-stuttgart.de (Thomas Schoebel) writes:

*> No, it's even the same for k=0. See Hopcroft/Ullman page 256, where it is*

*> shown that any deterministic pushdown automata (DPDA) can be converted to*

*> an LR(0) grammar. Since any LR(k) (k>0) parser is in fact a DPDA, the*

*> *language classes* are equal for any k. *

Sorry, I have to correct myself: The language class LR(0) as defined in

Hopcroft/Ullman is a strict subset of the LR(1) class. This is because of

the prefix property. In Hopcroft/Ullman, any deterministic context-free

language (even without the prefix property) can be converted to an LR(1)

grammar. LR(0) grammars cannot generate languages without prefix property.

However, this distinction is only of theoretical interest to a compiler

writer. In practice, *ANY* parser uses a special end-of-file symbol, and

*ANY* scanner must be designed so that distinction of "an input symbol is

present" and "no more input symbol is present" is carried out. This is

simply because a "non-existant symbol" is something we have to deal with

and therefore must be explicitly or implicitely stated somewhere. If it

were possible to omit this distinction, *ANY* parser (even LR(1) when

using the lookahead) could not decide at a point where it is possible to

accept, whether to continue (and hopefully find a matching input) or to

accept in this situation.

So in *practice* the language classes LR(0) and LR(1) are equal. The

theoretical difference comes only from the fact that "acceptance"

of a pushdown automaton is defined in Hopcroft/Ullman by two conditions:

1) The automaton has reached an accepting state

2) The input word is fully processed

This definition inherently contains a nondeterminism, so IMHO the

Hopcroft/Umman definiton of "deterministic pushdown automaton" (DPDA)

is in some sense not fully correct.

This can be seen if the DPDA has to be simulated by a deterministic turing

machine (DTM): With turing machines, acceptance is usually defined by

simply reaching an accepting state. If the turing machine has to determine

the above condition 2), it has to look "after" the input word whether

there is a blank. The turing machine *MUST* contain a transition for blank

in its \delta function if it has to decide this deterministically. But

augmenting the alphabet by a new symbol (call it either "EOF" or "blank")

which has to follow the original word and testing for its presence is

nothing but changing the language so that it has the prefix property. The

DTM has to *touch* the blank, and this is something not belonging to the

input. In other words, simulation of DPDA's by DTM's is in general

impossible without transforming the input language.

To me it is questionable where a DPDA as defined in Hopcroft/Ullman is

really *deterministic* in full sense. That the transition relation \delta

must be a function rather than a true relation is not enough; it is also

important that leaving an accepting state by comsuming the next symbol is

not possible, because this is a HIDDEN NONDETERMINISM. Even in

Hopcroft/Ullman, a DPA is supposed to always look only at one input symbol

at a time, so consequently it has no chance to determine situation 2) of

above. The test of situation 2) is not done by the DPDA itself, rather

than by a "watching instance" of the computing process.

Suppose there are transitions out from an accepting state. Without

lookahead, the automaton now has to *decide* whether to accept or to

continue. If it decides to continue at the last symbol, then it dies and

hence would reject the input word. So this inherent nondeterministic

choice leads to acceptance in one case and to rejection in the other one.

Something is not well-defined here!!

It appears to me that deterministic context-free languages without prefix

property as defined in Hopcroft/Ullman are something which is not really

deterministic. IMHO, determinism should be better defined to imply the

prefix property automatically.

-- Thomas

[The LR(0)/LR(1) distinction was also noted by Sriram Sankar

<sankar@Neon.Stanford.EDU> -John]

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.