Re: Commonality in subset construction and LR set of items construction algorithms

Sylvain Schmitz <schmitz@i3s.unice.fr>
Sat, 25 Aug 2007 21:38:45 +0200

          From comp.compilers

Related articles
Commonality in subset construction and LR set of items construction al cfc@shell01.TheWorld.com (Chris F Clark) (2007-08-24)
Re: Commonality in subset construction and LR set of items constructio anton@mips.complang.tuwien.ac.at (2007-08-25)
Re: Commonality in subset construction and LR set of items constructio schmitz@i3s.unice.fr (Sylvain Schmitz) (2007-08-25)
Re: Commonality in subset construction and LR set of items constructio DrDiettrich1@aol.com (Hans-Peter Diettrich) (2007-08-26)
Re: Commonality in subset construction and LR set of items constructio torbenm@app-6.diku.dk (2007-08-27)
| List of all articles for this month |

From: Sylvain Schmitz <schmitz@i3s.unice.fr>
Newsgroups: comp.theory,comp.compilers
Date: Sat, 25 Aug 2007 21:38:45 +0200
Organization: Compilers Central
References: 07-08-071
Keywords: parse, LR(1)
Posted-Date: 26 Aug 2007 10:20:19 EDT

Chris F Clark wrote:
> Recently the similarities between the subset construction algorithm to
> transform an NFA into a DFA and the LR set of items construction
> algorithm have been repeatedly thrust upon me, so much so that I have
> a hard time as seeing them as anything but one algorithm.
>
> Is this similarity a well known fact that I just somehow didn't learn
> or forgot? Do the algorithms share some common root? In fact, is
> there a common basic algorithm underlying both of these algorithms
> that unifies them (and perhaps a bunch of other algorithms that I'm
> not even aware of)? If so, what is that algorithm called and what are
> some other examples of its application?
>
> If this is something that I should know, please correct my deficiency
> now....


The similarity is no coincidence. The idea of LR parsing is that a
handle in a sentential form, i.e. the part of the sentential form which
should be reduced in canonical bottom-up parsing, can be identified by a
finite state automaton---sometimes dubbed handle-finding automaton.
Take for instance the grammar with rules


      S -> A S a | a
      A -> b


A sentence


      b b a a a


is also a sentential form, and in bottom-up parsing, the first "b"
should be reduced to "A", yielding the sentential form


      A b a a a


The second "b" should be reduced after reading "Ab", then the first "a"
after reading "AAba", and then "ASa" after reading "AASa" and so on. It
happens that, for any context-free grammars, the language of terminals
and nonterminals prefixes up to the point where a reduction should occur
is a regular language, and can be computed from the grammar. For our
example, we can devise a right linear grammar for this language, using
"[A]" and "[S]" as nonterminals, and "a", "b", "A" and "S" as terminals:


      [S] -> [A] | A [S] | A S a | a
      [A] -> b


The language described by this characteristic grammar gives exactly
where the next reduction should occur. Observe that this grammar
derives the prefixes mentioned earlier. Constructing a deterministic
finite state automaton from this grammar to find the handles in
sentential forms ends up being the same as the LR item subset construction.


As far as I remember, all this was shown by Knuth in the original LR paper.
--
Hope that helps,


      Sylvain


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.