Sat, 25 Aug 2007 21:38:45 +0200

Related articles |
---|

Commonality in subset construction and LR set of items construction al cfc@shell01.TheWorld.com (Chris F Clark) (2007-08-24) |

Re: Commonality in subset construction and LR set of items constructio anton@mips.complang.tuwien.ac.at (2007-08-25) |

Re: Commonality in subset construction and LR set of items constructio schmitz@i3s.unice.fr (Sylvain Schmitz) (2007-08-25) |

Re: Commonality in subset construction and LR set of items constructio DrDiettrich1@aol.com (Hans-Peter Diettrich) (2007-08-26) |

Re: Commonality in subset construction and LR set of items constructio torbenm@app-6.diku.dk (2007-08-27) |

From: | Sylvain Schmitz <schmitz@i3s.unice.fr> |

Newsgroups: | comp.theory,comp.compilers |

Date: | Sat, 25 Aug 2007 21:38:45 +0200 |

Organization: | Compilers Central |

References: | 07-08-071 |

Keywords: | parse, LR(1) |

Chris F Clark wrote:

*> Recently the similarities between the subset construction algorithm to*

*> transform an NFA into a DFA and the LR set of items construction*

*> algorithm have been repeatedly thrust upon me, so much so that I have*

*> a hard time as seeing them as anything but one algorithm.*

*>*

*> Is this similarity a well known fact that I just somehow didn't learn*

*> or forgot? Do the algorithms share some common root? In fact, is*

*> there a common basic algorithm underlying both of these algorithms*

*> that unifies them (and perhaps a bunch of other algorithms that I'm*

*> not even aware of)? If so, what is that algorithm called and what are*

*> some other examples of its application?*

*>*

*> If this is something that I should know, please correct my deficiency*

*> now....*

The similarity is no coincidence. The idea of LR parsing is that a

handle in a sentential form, i.e. the part of the sentential form which

should be reduced in canonical bottom-up parsing, can be identified by a

finite state automaton---sometimes dubbed handle-finding automaton.

Take for instance the grammar with rules

S -> A S a | a

A -> b

A sentence

b b a a a

is also a sentential form, and in bottom-up parsing, the first "b"

should be reduced to "A", yielding the sentential form

A b a a a

The second "b" should be reduced after reading "Ab", then the first "a"

after reading "AAba", and then "ASa" after reading "AASa" and so on. It

happens that, for any context-free grammars, the language of terminals

and nonterminals prefixes up to the point where a reduction should occur

is a regular language, and can be computed from the grammar. For our

example, we can devise a right linear grammar for this language, using

"[A]" and "[S]" as nonterminals, and "a", "b", "A" and "S" as terminals:

[S] -> [A] | A [S] | A S a | a

[A] -> b

The language described by this characteristic grammar gives exactly

where the next reduction should occur. Observe that this grammar

derives the prefixes mentioned earlier. Constructing a deterministic

finite state automaton from this grammar to find the handles in

sentential forms ends up being the same as the LR item subset construction.

As far as I remember, all this was shown by Knuth in the original LR paper.

--

Hope that helps,

Sylvain

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.