Re: Bermudez-Logothetis LALR(1) Lookahead Algorithm

Rob Arthan <rda@lemma-one.com>
24 Feb 2003 18:05:24 -0500

          From comp.compilers

Related articles
Bermudez-Logothetis LALR(1) Lookahead Algorithm rda@lemma-one.com (Rob Arthan) (2003-02-21)
Re: Bermudez-Logothetis LALR(1) Lookahead Algorithm rda@lemma-one.com (Rob Arthan) (2003-02-24)
Re:Bermudez-Logothetis LALR(1) Lookahead Algorithm robert.thorpe@antenova.com (Rob Thorpe) (2003-03-09)
| List of all articles for this month |

From: Rob Arthan <rda@lemma-one.com>
Newsgroups: comp.compilers
Date: 24 Feb 2003 18:05:24 -0500
Organization: Lemma 1 Ltd.
References: 03-02-126
Keywords: parse, LALR
Posted-Date: 24 Feb 2003 18:05:24 EST

Rob Arthan wrote:


> I am working on an implementation of the Bermudez-Logothetis method for
> calculating LALR(1) lookahead sets (from their paper Simple Computation of
> LALR(1) Lookahead Set, in Information Processing Letters 31(1989), pp.
> 233-238). I'm actually upgrading an SLR(1) implementation, so their
> treatment is very convenient for me because I have most of the harder
> algorithms coded and tested already.
>
> Their method is to construct an auxiliary grammar. They don't give it a
> name, so I'm calling it the state transition grammar. The symbols of the
> state transition grammar are the transitions of the LR(0) automation.
> I.e., they are pairs written [p:X] where p is a state of the LR(0)
> automaton and X is a symbol of the original grammar which labels an edge
> out of p.
>
> If I'm reading the paper aright, Bermudez & Logothetis define the
> productions of the state transition grammar to be all productions of the
> form:
>
> [p_1:A] -> [p_1:X_1][p_2:X_2]...[p_n:X_n]
>
> where A -> X_1X_2...X_n is a production of the original grammar. Their
> formal definition doesn't make any reference to the state transition
> function of the LR(0) automaton.
>
> Am I right in thinking that this is overkill or am I missing something?
> Surely you only need toi consider the productions of the above form, where
> for each i, 1 <= i < n, there is a transition from p_i to p_{i+1} along an
> edge labelled with X_i. The diagrams in the paper strongly suggest this,
> but the formal definition of the productions in the state transition
> grammar doesn't.


Thanks to those who replied by e-mail urging me to tread cautiously -
I was, I assure you. I now have a working LALR(1) parser generator
implemented in and generating Standard ML (snd have tested the results
against an earlier SLR(1) parser generator and against bison). This
should appear in the next open source release of the ProofPower
system.


I have had a separate e-mail correspondence with Prof. Bermudez who
has kindly confirmed the following as the right reading of the
relevant parts of the paper (and so confirmed my thoughts above and
pointed out a mistake that I nearly made):


In the production:


         [p_1:A] -> [p_1:X_1][p_2:X_2]...[p_n:X_n]


the sequence of p_i and X_i is required to form a path in the graph of
the LR(0) automaton. However, the state, p_{n+1}, say, at the end of this
path will not normally be the state at the end of the edge [p_1:A].


So for example, in figure 5 of the paper, there is a production


        [1:S] -> [1:a][6:A][7:c]


but there is no production:


        [1:S] -> [1:a][6:A][13:c]


because, while [13:c] is one of the symbols in the state transition
grammar, there is no path from state 6 to state 13 via non-terminal A.


Moreover, the state at the end of the path [1:a][6:A][7:c] is 8, which is
not the same as the state at the end of the edge [1:S], which is 2. What
happens is that on state 8, reading end-of-input, the parser will reduce to
S and pop its stack until it gets back to state 1 and then it will move via
state 2 to the accepting state.


This all turns out to be wonderfully easy to implement - it only adds about
300 non-comment lines of Standard ML code to my earlier SLR(1)-based
parser generator (and about 80 of those are for diagnostics and listings).
I have been put off implementing LALR(1) for over 10 years because of the
uninspiring description of the over-complex (and slow) yacc algorithm in
Aho & Ullman - drat that dragon!


Rob Arthan (rda@lemma-one.com)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.