Re: Computing Follow set

Max Hailperin <max@gustavus.edu>
Fri, 07 Mar 2008 07:09:32 -0600

          From comp.compilers

Related articles
Computing Follow set pavan.mail@gmail.com (pavan) (2008-03-06)
Re: Computing Follow set rsc@swtch.com (Russ Cox) (2008-03-06)
Re: Computing Follow set torbenm@app-2.diku.dk (2008-03-07)
Re: Computing Follow set max@gustavus.edu (Max Hailperin) (2008-03-07)
Re: Computing Follow set haberg_20080207@math.su.se (Hans Aberg) (2008-03-08)
| List of all articles for this month |
From: Max Hailperin <max@gustavus.edu>
Newsgroups: comp.compilers
Date: Fri, 07 Mar 2008 07:09:32 -0600
Organization: Compilers Central
References: 08-03-033
Keywords: LR(1), comment
Posted-Date: 08 Mar 2008 10:57:27 EST

pavan <pavan.mail@gmail.com> writes:


> I have a question regarding the computation of FOLLOW sets.
> Consider the following grammar:
>
> A -> aB | a
> B -> bA | b
>
> From the production A -> aB, we have FOLLOW(B) contains FOLLOW(A).
> From the production B -> bA, we have FOLLOW(A) contains FOLLOW(B).
>
> This ends up being an infinite loop when I code it. I would appreciate
> your suggestions on this.
>
> Thank you.


Great question. In fact, beyond the issue of code going into an
infinite loop, there is even a question of what the FOLLOW sets should
be once the code gets done (somehow) computing them. For example, is
b in FOLLOW(A)? Well, it is if it is in FOLLOW(B). But is it in
FOLLOW(B)? Well, it is if it is in FOLLOW(A). So, it would be
consistent to conclude that it is in both FOLLOW sets -- or neither.
Of course, only one of those two consistent answers is the *correct*
one -- the one that describes the terminals that can actually come
after an A or B in a derivation. And the correct answer, as I hope
you'll see, is the one without b -- we want the smaller, junk-free
solution to the system of equations that defines the FOLLOW sets.
This is becauase in any derivation starting from A, the A or B will be
followed only by the end of input (generally symbolized as $).


This turns out to be an example of a general phenomenon that recurs
over and over again in compiler-related problems (as well as in some
other areas). Other examples would be the epsilong-closure process
used in translating an NFA to a DFA, the computation of FIRST sets,
and monotone dataflow analysis. In all these cases, we have a cyclic
system of constraints that describe some values, but which do not (in
general) uniquely specify them. The solution we want is the least
solution. We can find this least solution using the following
algorithmic approach (which is what you were asking for).


Initially approximate the values by the least element of their type;
for FOLLOW sets, this would mean to start by assuming they are all
empty. Then look for violated constraints. In your example, you would
notice that $ needs to be in FOLLOW(A), but is not yet, because
FOLLOW(A) is empty. So, increase the too-small value so as to correct
this constraint violation; that is, add $ into FOLLOW(A). Now repeat
this process of finding and repairing constraint violations so long as
there are any. In each case, the repair would consist of increasing
some value -- in the case of computing FOLLOW sets, adding something
into a FOLLOW set. In your example, your second check for a
constraint violation would find that FOLLOW(A) is not a subset of
FOLLOW(B), as you correctly observed it must be. To repair this, you
would add $ into FOLLOW(B). At this point, there would be no more
constraint violations, which would be the algorithm's sign that it
should terminate. So, to answer your question about the code looping,
the loop test should be whether there is any further motivation to add
any symbols to any of the FOLLOW sets.


The reason why this algorithm is sure to terminate is because we only
ever increase the FOLLOW sets and there is a limit to how large they
can grow; at most each contains all the terminals and $. Translating
this argument to a more general setting, the algorithm would terminate
not only for increasing sets that have some finite bound, but in any
case where we are increasing values in some partial order that doesn't
have infinitely ascending chains.


The somewhat more subtle question is how we know that the solution we
find upon termination is the one we wanted -- the junk-free one. In
fact, we want to be sure that we arrive at this same, correct,
solution no matter what choices we make along the way at any point
where there is more than one constraint violation we could choose to
repair.


Intuitively, the reason why we know we reach the correct, junk-free,
solution is that we start with empty sets and only add anything to
them when we had a good reason to. For a more rigorous argument, we
could recognize the constraint repair process as the application of
monotonic functions and use their monotonicity to construct an
inductive proof that their least fixed point is reached. To see how
this is done, I suggest you take a look at a pedagogic paper I wrote
some years back about teaching the computation of FIRST sets along
these lines: http://gustavus.edu/+max/ffp-sigcse.html


  -Max Hailperin
    Professor of Computer Science
    Gustavus Adolphus College


[Thanks to several other people who sent in shorter messages with similar
advice. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.