|Technical question about yacc and LALR parsers jar@florida.HQ.Ileaf.COMid AA07671; Sun, 21 Apr 91 (1991-04-20)|
|Re: Technical question about yacc and LALR parsers firstname.lastname@example.org (1991-04-22)|
|Re: Technical question about yacc and LALR parsers email@example.com (1991-04-22)|
|Re: Technical question about yacc and LALR parsers firstname.lastname@example.org (1991-04-22)|
|Re: Technical question about yacc and LALR parsers email@example.com (1991-04-23)|
|Technical question about yacc and LALR parsers firstname.lastname@example.org (1991-04-21)|
|From:||jar@florida.HQ.Ileaf.COM (Jim Roskind x5570) (5.61/UUNET-shadow-mx) id AA07671; Sun, 21 Apr 91 00:33:37 -0400|
|Keywords:||yacc, parse, question, errors|
|Date:||Sat, 20 Apr 91 22:04:33 EDT|
I have been spending this weekend hacking byacc to automate the
process that I go through to evaluate conflicts. One part of this is
walking backwards through a parser table. When I generated backwards
pointers for each state in the parse, I noticed something rather
surprising (at least to me). Basically, for each state k in the
parser, there are many states that can transition to k (no surprise),
but all such states transition with for the same reason (I was
surprised) i.e, all shifts/goto's into state k are made on the same
terminal/nonterminal. To put it another way, when the parser
generator is sitting in any given state k, the token just to the left
of the carrot is a function of k only.
When I first coded my data structures, I assumed that for each state
k, I would need a list of prior states, and I also wanted to save the
token name that was used to transition from the "prior state". When
all was said and done, I found that it was sufficient to save a "list
of prior states" for each state, as well as a single "prior token" for
If I haven't still made myself clear, the following is an example of
something that will *never* be found in the verbose output from yacc:
parened_type : '(' VOID . ')'
parened_type : '(' INT . ')'
Basically, if this was ever seen, it would mean that in state 727, it
was possible to have two distinct tokens to the immediate left of the
carrot. Note that the following is possible (I think):
gathered_type : '[' VOID . ']'
gathered_type : '(' VOID . ')'
as it is certainly *not* the case that the state defines the entire
left context of the carrot.
One corollary (via the pigeon hole principle) of this observation is
that there must me more states in such a parser than there are
terminals plus nonterminals.
Assuming that I have explained myself, the question is: Is this
property true of all LALR parsers? It was very conceivable that I
could hand code a table driven "LR like" parser that would *not* have
this property. Needless to say, my fear is that my "simplified data
structures" are not sufficient for all grammars that byacc might
Jim Roskind- Author of a YACCable C++ grammar
(407)729-4348 or (617)290-4990 x5570
email@example.com or ...!uunet!leafusa!jar
Return to the
Search the comp.compilers archives again.