Re: How test if language is LL(k) or LR(k) ?

Stephen Horne <sh006d3592@blueyonder.co.uk>
Wed, 29 Oct 2008 14:17:38 +0000

          From comp.compilers

Related articles
How test if language is LL(k) or LR(k) ? a.moderacja@gmail.com (Borneq) (2008-10-23)
Re: How test if language is LL(k) or LR(k) ? cfc@shell01.TheWorld.com (Chris F Clark) (2008-10-28)
Re: How test if language is LL(k) or LR(k) ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-10-29)
Re: How test if language is LL(k) or LR(k) ? rpboland@gmail.com (Ralph Boland) (2008-10-31)
Re: How test if language is LL(k) or LR(k) ? cfc@shell01.TheWorld.com (Chris F Clark) (2008-11-06)
Re: How test if language is LL(k) or LR(k) ? torbenm@pc-003.diku.dk (2008-11-07)
Re: How test if language is LL(k) or LR(k) ? ulf.schwekendiek@googlemail.com (ulf.schwekendiek@googlemail.com) (2008-11-07)
Re: How test if language is LL(k) or LR(k) ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-11-07)
Re: How test if language is LL(k) or LR(k) ? torbenm@pc-003.diku.dk (2008-11-10)
[2 later articles]
| List of all articles for this month |

From: Stephen Horne <sh006d3592@blueyonder.co.uk>
Newsgroups: comp.compilers
Date: Wed, 29 Oct 2008 14:17:38 +0000
Organization: virginmedia.com
References: 08-10-044
Keywords: parse, theory
Posted-Date: 29 Oct 2008 19:27:12 EDT

On Thu, 23 Oct 2008 13:13:12 -0700 (PDT), Borneq
<a.moderacja@gmail.com> wrote:


>Where k is not specified.
>I would not testing if is LL(1),LL(2),LL(3) to infinity but test if
>language if LL(k) and would be nice if algorithm return k, find k for
>LL(k) LR(K) in finite time.


For LL, I don't know much.


For LR, *almost* all context-free grammars are LR(k < infinity), so if
you can prove that the grammar is context-free you are almost there.


You can always construct an LR(1) - or even LR(0) - state model to
parse a context-free grammar, but if the grammar is not LR(1) that
state model will have shift/reduce or reduce/reduce conflicts.


Almost all such conflicts can be resolved finitely. This is the logic
behind Tomita and other generalised LR parsing algorithms. The basic
principle is that conflicts are resolved by checking all possibilities
- the Tomita approach being to check them concurrently using stack
duplication, whereas another common approach is to use backtracking.


The basic problem is that sometimes, there are infinite possibilities
to check. This can occur when the grammar contains empty rules - when
there are reductions that pop zero tokens from the stack. However, it
doesn't happen for *all* cases where empty rules are used - except in
the case of LR(0), I think - and it is not reasonable to ban such
rules.


In such cases, Tomita will give an infinite number of stack
duplicates, whereas the backtracking approach will simply give an
infinite loop.


Longer lookaheads can bypass most cases where these loops can happen,
and even LR(1) can handle most practical grammars with empty rules.
However, in general, it is not decidable for a particular grammar
whether such loops can be avoided using longer lookaheads. It is only
decidable if you also know the specific input that will be parsed.


I have designed an variant of GLR (though I have never implemented it)
which should theoretically give constant-time parsing of any context
free grammar that can be parsed, but which (unlike Tomita or
backtracking) can detect these infinite loops at run-time. I say "I
have designed", but it's not my invention - I was beaten to it by
about a decade. I was inspired by the packrat approach to LL parsers,
and designed a memoized/dynamic/tabular LR parsing algorithm, but once
I'd got the basic idea, it was also easy to find certain papers by
Mark-Jan Nederhof et al, published in the mid-90s.


Detecting the infinite loops in this tabular approach is not trivial,
but is constant-time. It involves a digraph cycle check for one column
of the table. This is constant-time because there is a maximum digraph
size, which is decided by the state model and thus the grammar. There
is no rule to say that constant-time must be fast ;-)


Basically, that's it - *almost* any context-free grammar is LR(k <
infinity) but there are a few that are not, and there are some
grammars which cannot be proven either LR(k) or otherwise except by
iterating through the infinite set of possible inputs.


For those grammars that are LR(k), k can be found by starting with an
LR(1) state model and identifying the states at which conflicts exist.
At each of these states, extend the model by adding "lookahead" states
until all the conflicts are resolved. That is, the actions for the
transitions are neither "shift" nor "reduce" but a kind of lookahead
pseudo-shift that adds nothing to the stack. In a sense, this is
building the stack duplication/backtracking into the state model
itself. The maximum number of lookahead steps needed to resolve all
conflicts determines k, and a genuine LR(k) model can then be derived.


This always has the possibility of an infinite loop growing an
infinite state model due to those undecided non-LR(k) cases, however.
My understanding is that this limitation cannot be overcome since
doing so would provide a solution to the halting problem, which is
proven non-decidable.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.