Re: How test if language is LL(k) or LR(k) ?

torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)
Mon, 10 Nov 2008 17:54:35 +0100

From comp.compilers

Related articles
[2 earlier articles]
Re: How test if language is LL(k) or LR(k) ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-10-29)
Re: How test if language is LL(k) or LR(k) ? rpboland@gmail.com (Ralph Boland) (2008-10-31)
Re: How test if language is LL(k) or LR(k) ? cfc@shell01.TheWorld.com (Chris F Clark) (2008-11-06)
Re: How test if language is LL(k) or LR(k) ? torbenm@pc-003.diku.dk (2008-11-07)
Re: How test if language is LL(k) or LR(k) ? ulf.schwekendiek@googlemail.com (ulf.schwekendiek@googlemail.com) (2008-11-07)
Re: How test if language is LL(k) or LR(k) ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-11-07)
*Re: How test if language is LL(k) or LR(k) ? torbenm@pc-003.diku.dk* (2008-11-10)**
Re: How test if language is LL(k) or LR(k) ? cfc@shell01.TheWorld.com (Chris F Clark) (2008-11-10)
Re: How test if language is LL(k) or LR(k) ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-11-12)

| List of all articles for this month |

From:	torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)
Newsgroups:	comp.compilers
Date:	Mon, 10 Nov 2008 17:54:35 +0100
Organization:	Department of Computer Science, University of Copenhagen
References:	08-10-044 08-10-055 08-11-033 08-11-035
Keywords:	parse, LR(1)
Posted-Date:	10 Nov 2008 15:25:03 EST

Stephen Horne <sh006d3592@blueyonder.co.uk> writes:

> On Fri, 07 Nov 2008 14:27:18 +0100, torbenm@pc-003.diku.dk (Torben AEgidius Mogensen) wrote:
>
>>Stephen Horne <sh006d3592@blueyonder.co.uk> writes:
>>
>>> For LR, *almost* all context-free grammars are LR(k < infinity), so if
>>> you can prove that the grammar is context-free you are almost there.
>>
>>That's not very useful, and not even true. There are many useful
>>grammars that are not LR(k) for any k and even useful languages for
>>which there doesn't exits LR(k) grammars for any k.
>
> Context-free grammars? And for any finite k?

Indeed. The language of all even palindromic sequences over the
alphabet {a,b} is not LR(k) for any k. Note that this has an
unambiguous grammar:

    P -> aPa
    P -> bPb
    P ->

> My understanding is that you'd need an infinite input. While there are
> certainly applications where unbounded inputs occur, a genuinely
> infinite input can never be completely parsed anyway.

It is true that for any particular input, you need only finite
lookahead, but for a grammar to be LR(k), the same k can be used for
all inputs.

> I'm not saying I'm right here, as I'm certain you know more than me,
> but I'd like to see an example.
>
>>> You can always construct an LR(1) - or even LR(0) - state model to
>>> parse a context-free grammar, but if the grammar is not LR(1) that
>>> state model will have shift/reduce or reduce/reduce conflicts.
>>>
>>> Almost all such conflicts can be resolved finitely. This is the logic
>>> behind Tomita and other generalised LR parsing algorithms. The basic
>>> principle is that conflicts are resolved by checking all possibilities
>>> - the Tomita approach being to check them concurrently using stack
>>> duplication, whereas another common approach is to use backtracking.
>>
>>This does, however, not give you any k, since the amount you have to
>>read ahead before backtracking may depend on the input.
>
> My claim is that almost all conflicts can be resolved finitely.

If you have an ambiguous grammar, this is certainly not true. But if
the grammar is unambiguous, there is by definition only one possible
parse tree for a string, and since there exist algorithms for finding
this, you can say that you resolve the conflict by finite lookahead.
But, again, this lookahead is input-dependent.

> I never claimed that this tells me the value of k. A method for
> determining k was vaguely described later in the post.

Vaguely indeed, as in wrong.

> Of course formally I'm wrong, because I'm counting in terms of what is
> useful rather than the far more numerous grammars that aren't.

How do you determine if a grammar is useful?

> Anyway, the main limitation on finite resolution is that the input
> itself must be finite. That's no cheat since any input that can be
> parsed completely must be finite. Obviously, formally, its a
> limitation - but if you're doing state-so-far parsing of infinite
> input streams, large lookaheads (often any lookaheads) are
> inappropriate anyway, so it didn't seem relevant.
>
> I probably should have mentioned this, of course.

I never talked about infinite input, so this is irrelevant.

>>> I have designed an variant of GLR (though I have never implemented it)
>>> which should theoretically give constant-time parsing of any context
>>> free grammar
>>
>>I assume you mean "linear-time", as constant-time parsing sounds
>>implausible.
>
> Yes - sorry - I meant constant time per token of input. For some
> variants (depending on how the table is managed and evaluated) that's
> amortised constant time per token of input.
>
>> But even linear-time parsing for any context-free
>>grammar sounds suspect and would be a very big result (if you can
>>prove it).
>
> 1. As I said, there's a pretty big proviso in there, related to the
> halting problem. The only improvement in the power of this
> relative to backtracking/Tomita is that it can detect problems for
> specific inputs at run-time.
>
> Actually, there is a proviso to that proviso. My method doesn't
> need reduce sizes to be encoded in the state model - they are
> determined at run-time by the table. This makes it easy to support
> EBNF definitions for nonterminals. There's no extra power in terms
> of the set-of-strings definition of a grammar, but it's a little
> more flexible in how you get to define the grammar.

I just don't understand this description. Can you write it as
pseudo-code?

> 2. As I said, at best I re-invented it ten years after the fact.
> Formal papers have already been published for tabular LR parsing.

Indeed. But while they claim linear-time parsing for tables without
conflicts, there is no claim for tables with conflicts.

> I also suggest some tricks such as a kind of sliding window for
> the table for handling unbounded inputs, but that leads to further
> provisos. Truth told, even ignoring the cycle issues, you can
> only ensure linear time parsing of any CF grammar using my method
> if you know the full input up-front, so that you can use a
> preallocated table.

Knowing the full input up-front is no problem, but if you make a table
specifically to this input, you must show that this can be constructed
in linear time and that subsequent parsing using this table is also in
linear time.

> That said, I believe that a proof that general CF parsing is
> linear time was described here some years back. I started a thread
> in 2002 relating to linear time parsing (my idea back then was
> broken), and one of the replies referred to Kolomogorov complexity
> theory. 6 years later, and I still haven't done the reading to
> understand that.

I believe linear-time CF parsing is still an open problem.

> 3. A lot of the costs have big constant factors. The cycle checks for
> a start. *Very* unlikely to be mainstream. Potentially useful for
> niche applications, I suppose, but even then I have my doubts.

For just proving linear-time, big constant factors don't matter - as
long as they are constants independent of the input size.

> My notes are in a 12-page approx. 340K OpenDocument Text file. The
> real explanation of the method takes about 3 pages, the rest is on
> variations and other general notes.

If you really have an algorithm for linear-time CF parsing, send a
paper to a journal to get it reviewed. If you get it past the
reviewers, I might be persuaded to read your article. There are just
too many "I have this wonderful solution to (open problem), which I
have described in (long and impenetrable text)" claims out there for
me to want to read one more.

Torben

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: How test if language is LL(k) or LR(k) ?

torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)Mon, 10 Nov 2008 17:54:35 +0100

torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)
Mon, 10 Nov 2008 17:54:35 +0100