Re: Can Pascal be parsed by LR(1) parsing algorithm?

jas@Ingres.COM (Jim Shankland)
Sun, 28 Oct 90 13:46:06 PST

          From comp.compilers

Related articles
[10 earlier articles]
Re: Can Pascal be parsed by LR(1) parsing algorithm? firth@sei.cmu.edu (1990-10-17)
Re: Can Pascal be parsed by LR(1) parsing algorithm? firth@sei.cmu.edu (1990-10-18)
Re: Can Pascal be parsed by LR(1) parsing algorithm? djones@megatest.uucp (1990-10-21)
Re: Can Pascal be parsed by LR(1) parsing algorithm? crocker@Alliant.COM (1990-10-23)
Re: Can Pascal be parsed by LR(1) parsing algorithm? piet@cs.ruu.nl (1990-10-26)
Re: Can Pascal be parsed by LR(1) parsing algorithm? andy@Theory.Stanford.EDU (1990-10-26)
Re: Can Pascal be parsed by LR(1) parsing algorithm? jas@Ingres.COM (1990-10-28)
Re: Can Pascal be parsed by LR(1) parsing algorithm? firth@sei.cmu.edu (1990-11-05)
| List of all articles for this month |

Newsgroups: comp.compilers
From: jas@Ingres.COM (Jim Shankland)
Keywords: Pascal, LL(1)
Organization: The Eddie Group
References: <9112@fy.sei.cmu.edu) <9010232339.AA20860@Alliant.COM> <1990Oct26.220803.1411@Neon.Stanford.EDU>
Date: Sun, 28 Oct 90 13:46:06 PST

In article <1990Oct26.220803.1411@Neon.Stanford.EDU> andy@Theory.Stanford.EDU (Andy Freeman) writes:
>In article <9010232339.AA20860@Alliant.COM> crocker@Alliant.COM (Ben Crocker) writes:
>>Having written a Pascal compiler with an LL(1) parser generator, I can
>>vouch for the proposition that Pascal is LL(1).
>
>Such compilers are built on tokenizers with 2 character look-ahead. Remember
>that "1..5" has the same tokens as "1 .. 5", but requires 2 character
>look-ahead to distinguish from streams containing "1.<digit>".
>
>Look-ahead 2 tokenising feeding a Lx(1) parser does not demonstrate that
>Pascal is Lx(1); it demonstrates that a tokenized version of a language may
>have different look-ahead requirements than the stream-of-characters version.


Such postings should best be made from a machine with a name other than
"theory.stanford.edu" :->.


For any language with an LR(n) grammar, n > 0, there is an LR(n -1)
grammar accepting the same language. In other words, a language is
either LR or not LR; it makes no sense to say that a language is LR(k),
for any particular k. There are practical considerations, in that
the LR(n) grammar for a language may be considerably smaller and more
comprehensible than the LR(n - 1) grammar. But that's all. The original
proof of this is due to Knuth; it can no doubt be found in many sources,
certainly in Hopcroft and Ullman. (I'm away from my references, so I
can't be more precise; sorry.)


As for LL languages, in <1990Oct26.213044.28243@Neon.Stanford.EDU>,
Sriram Sankar (sankar@neon.stanford.edu) asserts:


>LL(n+1) languages are a strict superset of LL(n) languages;


As I said, I'm away from my references, and I'm not really a theoretician,
but I believe this, too, to be incorrect. I think that if there is
an LL(n + 1) grammar for a language, then there will also be some
LL(n) grammar for the language (n >= 0). Consider the following proof
sketch showing that an LL(2) grammar can be rewritten into an LL(1)
grammar (the generalization should be straightforward):


If a grammar G is LL(2), but not LL(1), then there must be productions
of the form


A --> a alpha
B --> a beta


such that both productions could occur from the same parser state. (A and B
might be the same non-terminal.) Intuitively, the parser cannot "know" which
production is the correct one with only the "a" as lookahead.


Now rewrite the grammar as follows: remove those two productions and
replace them with


A' --> alpha
B' --> beta


where A' and B' are new non-terminals not used elsewhere; also,
for any production P having an A or a B in its right-hand side,
add a new production P' that is just like P, except that every
occurrence of A on the rhs has been replaced by "a A'"; similarly
with B, substituting "a B'".


Intuitively, this defers the decision of which parse is correct by pushing
the a and encoding the fact that an a was read in the parser state.


Finally, as to whether Pascal is or is not LL(anything): I will ship a case
of Anchor Steam beer to anyone who can show that Pascal is even context-free
by writing a context-free grammar that will accept only correct Pascal
programs and reject all other strings, including strings that would be
correct Pascal programs except that an undeclared variable is used in the
program. No fair cheating by using a Turing machine (e.g., C code) to
maintain a symbol table.


jas
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.