Re: Parsing with infinite lookahead

"Terence Parr" <>
Fri, 25 Feb 1994 18:33:53 GMT

          From comp.compilers

Related articles
Parsing with infinite lookahead (Matt Timmermans/MSL) (1994-02-22)
Re: Parsing with infinite lookahead (1994-02-24)
Parsing with infinite lookahead (Stephen J Bevan) (1994-02-24)
Re: Parsing with infinite lookahead (1994-02-24)
Re: Parsing with infinite lookahead (Terence Parr) (1994-02-25)
Re: Parsing with infinite lookahead corbett@lupa.Eng.Sun.COM (1994-02-26)
Re: Parsing with infinite lookahead (1994-02-27)
Re: Parsing with infinite lookahead (1994-02-28)
Re: Parsing with infinite lookahead (1994-03-01)
Re: Parsing with infinite lookahead (1994-03-02)
Re: Parsing with infinite lookahead (1994-03-24)
| List of all articles for this month |

Newsgroups: comp.compilers
From: "Terence Parr" <>
Keywords: parse, theory, bibliography
Organization: Compilers Central
References: 94-02-174
Date: Fri, 25 Feb 1994 18:33:53 GMT

Matt Timmermans/MSL <> writes:

> Does anyone know of a deterministic method for parsing any unambiguous
> context-free grammar -- including those that require infinite lookahead?
                              ^^^^^^^ "language" (subtle, but important point)
> For example:
> S: A a | B b
> A: c | A c
> B: c | B c

I think the Early Algorithm (in O(n^3) time?) parses any unambiguous
context-free language. However, most parser-generators are restricted to
the LALR(1), LR(1), or LL(1) grammars and, hence, cannot handle all
context-free languages.

Much work has been done in theory, and some research tools have been
built, that extend LR-based parsers to use regular expressions as
lookahead rather than a single symbol. This is very very cool stuff. See
[CuC73] and [BeS90].

In practice, the only commonly-available tool that I know of that uses
more than a single symbol of lookahead is ANTLR (the parser-generator of
PCCTS). It accepts LL(k) for k>1 grammars and has a nifty feature called
a syntactic predicate [PaQ94] which indicates a syntactic context that
must be satisfied before application of an associated production is
authorized. Removing left-recursion and using the EBNF notation of ANTLR,
I get the following grammar:

s : (c)+ a
    | (c)+ b

This is non-LL(k) for any finite k (assuming that we cannot left-factor to
keep it interesting). However, I may use a syntactic predicate to specify
what context must be seen for the first production to be seen (the second
would be attempted by default upon failure of the predicate):

s : ((c)+ a)? (c)+ a
    | (c)+ b

As a shorthand, we allow:

s : ((c)+ a)?
    | (c)+ b

which says "I'm not sure if production one will match; give it a try."
Basically, the parser scans ahead in the infinite lookahead buffer to see
if an 'a' follows one or more 'c's.

Note that a syntactic predicate is predicting a production with a
context-free language which is stronger than predicting with a regular

ANTLR-generated parsers can recognize all context-free languages if you're
willing to use syntactic predicates all over the place. However, the
majority of the decisions in most grammar are deterministic and, hence,
(...)? predicates are rarely needed, yielding near-linear parse

Terence Parr


[CuC73] Karel Culik II, and Rina Cohen, ``LR-Regular Grammars--an
Extension of LR(k) Grammars,'' Journal of Computer and System
Sciences 7, 1973, pp 66-96.

[BeS90] Manuel E. Bermudez and Karl M. Schimpf, ``Practical Arbitrary
Lookahead LR Parsing,'' Journal of Computer and System
Sciences 41, 1990, pp 230-250.

[PaQ94] Terence J. Parr and Russell W. Quong, ``Adding Semantic and
Syntactic Predicates to LL(k) -- pred-LL(k).'' To Appear in
Proceedings of the International Conference on Compiler;
Edinburgh, Scotland; April 1994.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.