Re: What is the trick of languages being LL(k+1), but not LL(k)?

Chris F Clark <cfc@shell01.TheWorld.com>
9 Jan 2004 23:34:15 -0500

          From comp.compilers

Related articles
What is the trick of languages being LL(k+1), but not LL(k)? oliver@zeigermann.de (Oliver Zeigermann) (2004-01-07)
Re: What is the trick of languages being LL(k+1), but not LL(k)? cfc@shell01.TheWorld.com (Chris F Clark) (2004-01-09)
Re: What is the trick of languages being LL(k+1), but not LL(k)? oliver@zeigermann.de (Oliver Zeigermann) (2004-01-12)
Re: What is the trick of languages being LL(k+1), but not LL(k)? cfc@shell01.TheWorld.com (Chris F Clark) (2004-01-12)
Re: What is the trick of languages being LL(k+1), but not LL(k)? oliver@zeigermann.de (Oliver Zeigermann) (2004-01-16)
Re: What is the trick of languages being LL(k+1), but not LL(k)? cfc@world.std.com (Chris F Clark) (2004-01-17)
Re: What is the trick of languages being LL(k+1), but not LL(k)? oliver@zeigermann.de (Oliver Zeigermann) (2004-01-22)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 9 Jan 2004 23:34:15 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 04-01-027
Keywords: parse, LL(1)
Posted-Date: 09 Jan 2004 23:34:15 EST

Oliver asked why the rule in the next line is LL(k+1)


> a : A^k B s | C ;


This sounds supsiciously like a homework problem, so I will only give
a partial answer--one that should be sufficient for you to understand,
but not sufficient to allow you to turn it in as a homework answer.


The reason why the grammar is LL(k+1) is that is takes k tokens to get
through the A's and the k+1'st token is the B. No amount of rewriting
can get rid of those k A's (and still accept the same language). I
will leave it as an exercise to you to find the rule(s) which
"conflict(s)" with the "a" rule and can also start with k A's.


That is the key concept of LL(k) processing. There must be a way
within the first k tokens of a rule to determine which rule applies.
If there are two possible rules that apply at the k-th token, then the
grammar is at least k+1.


In practice, this is not such a bad limit as most "constructs" in most
programming languages have a unique starting token that tells you the
statement type--think of BASIC and how each line consists of a line
number followed by a unique keyword for each "type" of line. Even
languages like C have only one of several keywords that can begin a
declaration and not an assignment statement. The key part there is
that one must know which identifiers represent types and which
represent variables to make that ture, and hence the common hack of
tying the lexer to the symbol table.


The one place this tends to get one in trouble is "expressions",
because in expressions the keyword (operator) is often infix (or
perhaps suffix) and you cannot tell "expression + expression" from
"expression * expression" by looking at the tokens of the first
expression (which may be of unbounded length). Thus, grammars for
expressions which are written with recursion on both the left and
right sides are not LL(k) for any k, even though they are simple for
humans to understand--the recursion on the left is the problem.
However, it is usually possible to recast those expressions into a
form where there is no left recursion, by making up names for the
different types of sub-expressions (e.g. factor, term, primary, etc.)
and making them not left recursive.


To assure yourself that you understand the problem, is the reverse of
the language LL(k), and if so, for what k.


s: a s A | ;
a: s B A^k | C;


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
19 Bronte Way #33M voice : (508) 435-5016
Marlboro, MA 01752 USA fax : (508) 251-2347 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.