Re: LL vs LR languages (not grammars)

schoebel@minnie.informatik.uni-stuttgart.de (Thomas Schoebel-Theuer)
5 Jul 1996 12:00:42 -0400

          From comp.compilers

Related articles
LL vs LR languages (not grammars) platon!adrian@uunet.uu.net (1996-06-24)
Re: LL vs LR languages (not grammars) leichter@smarts.com (Jerry Leichter) (1996-06-26)
Re: LL vs LR languages (not grammars) ikastan@alumnae.caltech.edu (1996-06-30)
Re: LL vs LR languages (not grammars) fjh@mundook.cs.mu.OZ.AU (1996-06-30)
Re: LL vs LR languages (not grammars) schoebel@minnie.informatik.uni-stuttgart.de (1996-07-01)
Re: LL vs LR languages (not grammars) ikastan@alumnae.caltech.edu (1996-07-01)
Re: LL vs LR languages (not grammars) ikastan@alumnae.caltech.edu (1996-07-02)
Re: LL vs LR languages (not grammars) schoebel@minnie.informatik.uni-stuttgart.de (1996-07-05)
| List of all articles for this month |
From: schoebel@minnie.informatik.uni-stuttgart.de (Thomas Schoebel-Theuer)
Newsgroups: comp.compilers,comp.compilers.tools.pccts
Date: 5 Jul 1996 12:00:42 -0400
Organization: Department of Computer Science, University of Stuttgart, Germany
References: 96-06-103 96-06-109 96-07-012 96-07-027
Keywords: parse

|> Thomas Schoebel-Theuer <schoebel@minnie.informatik.uni-stuttgart.de> wrote:
|> >LL(oo) means to always detect the right alternative corresponding to a
|> >*uniquely determined* subtree of *the* syntax tree, if there is exactly one.
|> >In other words, if the grammar is unambiguous (or more precisely if the
|> >input word has exactly one parse), you will always choose the right
|> >alternatives corresponding to this one tree *in advance*, because you have
|> >infinite lookahead to detect this. Conversely, if there is more than one
|> >tree, you will always get a lookahead conflict.
|>
|>
|> With this definition, LL(oo) seems to be indulging in what one might
|> call 'focused nondeterminism'. It is possible to simply posit the defi-
|> nition and work with it... but since it is somewhat conclusory, an attempt
|> at PDA implementation is: allow nondeterminism, say using the backtracking
|> model, but make it exhaustive. That is, continue exploring choices even
|> after a successful computation, instead of announcing it. If it proves to
|> be the only one, then and only then accept.
|>
|> The case of the perfectionist backtracker.
|>
|> It is clearer under the replicant model, the PDA spawning clones at
|> every instance of a choice. Each clone reaches a conclusion (assuming
|> Greibach normal form, maybe) and the presence of two or more "accept"s
|> blows a fuse.
|>
|> Ignoring the details of LL or LR, does this properly describe
|> infinite lookahead?




Yes, very well.
You got the point.


You do not need exhaustive backtracking, you can use dynamic programming
additionally. By using the compression theorem, you can make the
configuration tree of the backtracking parser into a graph, which
corresponds to Earleys data structure nearly 1:1. Then you get O(n^2) time
and O(n) space for any unambiguous grammar. To test the unambiguity, you
just have to add a garbage collector to eliminate dead ends (which is
basically described in the Graham/Harrison/Ruzzo 1980 paper). You can even
get O(n) if the grammar is LR(k) for some k (which you don't need to know in
advance!) and if you use a right recursion elimination technique described
in my forthcoming thesis. The basic idea is in an article by Leo, but Leos
algorithm doesn't work in O(n) with certain LR(k) grammars having _hidden_
right recursion.


|> >In other words, since LL(oo) comprises the class of unambiguous grammars,
|> >it is strongly more powerful than deterministic methods.
|>
|> In a prior posting I suggested a simpler kind of lookahead -- and
|> weaker than the one above. Rephrasing it: the PDA enters a special state
|> and places a marker under the reading head (or: unconsumed input carries
|> a special starting symbol %). From then on it can read input without
|> consuming it, and can make transitions into subspecial states, but cannot
|> use the stack. There is a special state for returning to normal mode,
|> upon reading the marker (or the %) and with more than one transition (to
|> "report" the outcome of this foray). In short, 2-way FA operation. We
|> could call this LLL, lookie-loo lookahead.
|>
|> It can carry out some lookahead tasks beyond fixed k. I don't know
|> whether it corresponds to something known.


It is *regular* lookahead, as opposed to LL(oo) which uses full CF lookahead.
Suppose your stack has contents (A alpha $) where $ is bottom-of-stack. Say
you have to try two alternatives A -> beta and A -> gamma. You would have
to test whether the intersection of all prefixes of L(alpha beta $) with all
prefixes of L(alpha gamma $) is *regular*. This is undecidable in general.
You have to solve this problem in order to determine whether some word
ending with $ is in the intersection. If $ is in, the intersection of the
two languages is nonempty, and there is a lookahead conflict. To determine
with a FA that $ is not in, you basically have to precompute the
intersection language into a FA.


If you can decide the intersection problem in some cases (perhaps with a
probabilistic algorithm), you can build your FA to decide which alternative
is the right one.


So your problem is that membership in LLL is not statically decidable, as is
also the problem with LL(oo). You probably can look for useful subclasses
which are decidable, in order to allow precomputation of your FA.


Note that the full CF lookahead necessary for LL(oo) can be achieved nearly
without overhead, since the parsing algorithm for testing the lookahead at
runtime can be the same as is used for parsing anyway. You just have to
delay the final *parsing decision* as long as some ambiguities are not yet
removed, in worst case until the EOF is seen.


|> >An example: First look at the language L = { a^n b^n c^m } U { a^n b^m c^m },
|> [ . . . ]
|> >Now we modify the language by appending a single letter in both
|> >cases: L' = { a^n b^n c^m d } U { a^n b^m c^m e }.
|> LLL can do it, too, obviously enough. It only needs to look -- go
|> out and see the 'd' or the 'e'.
|> But LLL is less than LL(oo): one can change the example and foil
|> lookie-loos. Replace 'd' by f^i g^i, and 'e' by f^i g^(i+1), say.
|> Now it takes work to 'read' the criterion.


Good example!


So I'd suggest to use a backtracking parser with dynamic programming and
right recursion elimination, so you will get O(n) with most practical
grammars (close to LR(k)) and O(n^2) in rare worst cases, and you have the
full power of LL(oo). Trying to precompute everything to a deterministic
algorithm leads to statically undecidable classes very soon, so you would
have to restrict yourself to subclasses of LL(oo). As my opinion,
precomputing (for example the macro technique described in my paper) has a
very good use for improving algorithms by constant factors, but cannot solve
the general problem of statically decidable grammar classes, although it can
be used for characterizations of such classes (for example if a grammar can
be macroized by a particular packing strategy such that the resulting tree
generator is deterministic).


-- Thomas


P.S.: The reprint of my article is also available under
ftp://ftp.informatik.uni-stuttgart.de/pub/library/ncstrl.ustuttgart_fi/TR-1996-06/


Leo's article is


@article{Leo91,
    author = {Joop M. I. M. Leo},
    title = {A general context-free parsing algorithm running in linear time
                      on every LR(k) grammar without using lookahead},
    journal = {Theoretical Computer Science},
    volume = {82},
    pages = {165--176},
    year = {1991}
}
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.