Re: Set of allowable next tokens

csr!nigelh@sol.uvic.ca (R. Nigel Horspool)
12 Jan 91 23:37:21 GMT

From comp.compilers

Related articles
Set of allowable next tokens jmr1g@ra.cs.Virginia.EDU (1991-01-11)
Set of allowable next tokens meissner@osf.org (1991-01-11)
*Re: Set of allowable next tokens csr!nigelh@sol.uvic.ca* (1991-01-12)**
Re: Set of allowable next tokens mah@nestroy.wu-wien.ac.at (1991-01-14)
Re: Set of allowable next tokens megatest!djones@decwrl.dec.com (1991-01-16)
Re: Set of allowable next tokens megatest!djones@decwrl.dec.com (1991-01-15)

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	csr!nigelh@sol.uvic.ca (R. Nigel Horspool)
Keywords:	yacc, parse, debug
Organization:	University of Victoria
References:	<1991Jan11.203916.20495@murdoch.acc.Virginia.EDU>
Date:	12 Jan 91 23:37:21 GMT

jmr1g@ra.cs.Virginia.EDU (Jeremiah M. Ratner) writes:

>Is there any way i can configure yacc or some other tool to tell me,
>at each step in a parse, the set of tokens that may follow the current
>token? I am currently doing this by hand, and it is, as they say,
>a tedious and error-prone process.

Yacc optimizes its tables to use default reductions, so it is
impossible to tell from the state listing in the y.output file
which symbols are valid in any state that contains a reduce
action.

If you are content with an approximate answer, it can be obtained
directly from LR parse tables that have not been optimized.
E.g., the row of the LR parse table for state n might contain
entries like:
symbol a: shift to state s1
symbol b: shift to state s2
symbols {c,d,e}: reduce by rule r1
symbols {f,g}: reduce by rule r2
and the set of symbols for which actions are defined
(i.e. {a,b,c,d,e,f}) is a good approximation to what symbols are
allowed when state n is the top state on the LR state stack.
This info is readily available from yacc's y.output file.

The set is correct (not an approximation) if a full LR(1) parser
generator is used. However, for the SLR(1) and LALR(1) methods
(Yacc uses LALR(1)), the set of symbols that trigger reduce actions
may be too large. To get an exact answer, it is necessary to take
context into account to eliminate some impossible next symbols from
the set.

If you need to know the exact set of valid next symbols at some
point in a parse, you can run an algorithm that simulates the updates
that occur to the state stack when each reduce action is performed.
An algorithm in pseudo Pascal goes something like this:

        ValidNextSymbols := Valid( SS, VT );
{ where SS is the current state stack and VT is the set of
all terminal symbols in the grammar}

        function Valid( SS, T )
Result := set of symbols, x, such that the top state
on stack SS has a shift action for x or an
accept action for x;
Result := Result * T;

for each reduce action in the top state on stack SS do
Test := set of symbols that trigger the reduce action;
Test := Test * T;
if Test != empty then
CSS := copy of SS;
update the stack CSS as required by the reduce action;
{ i.e. pop k states where k is the length of the RHS
of the rule being reduced by, and push the state
reached by the goto action for the LHS symbol of
the rule }
Result := Result + Valid( CSS, Test );
end if
end for;
return Result;
        end function;

{Note: `+' represents set union and `*' represents set intersection.}

Since the algorithm checks which reduce symbols are valid (i.e. those
that can eventually be shifted), it works equally well on parse tables
that have been optimized by the use of default reductions. I.e., it
could be implemented to work with the tables used by a yacc-generated
parser.

R. Nigel Horspool
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Set of allowable next tokens

csr!nigelh@sol.uvic.ca (R. Nigel Horspool)12 Jan 91 23:37:21 GMT

csr!nigelh@sol.uvic.ca (R. Nigel Horspool)
12 Jan 91 23:37:21 GMT