Re: Types of grammars.

hunk@alpha1.csd.uwm.edu (Mark William Hopkins)
2 Jun 1999 01:38:39 -0400

From comp.compilers

Related articles
Types of grammars. Luke@comodo.techi.com.force9.net (Luke) (1998-08-19)
Re: Types of grammars. cfc@world.std.com (Chris F Clark) (1998-08-19)
Re: Types of grammars. cfc@world.std.com (Chris F Clark) (1998-08-20)
Re: Types of grammars. adrian@dcs.rhbnc.ac.uk (1998-08-24)
Re: Types of grammars. sergenth@tin.it (1999-05-20)
*Re: Types of grammars. hunk@alpha1.csd.uwm.edu* (1999-06-02)**

| List of all articles for this month |

From:	hunk@alpha1.csd.uwm.edu (Mark William Hopkins)
Newsgroups:	comp.compilers
Date:	2 Jun 1999 01:38:39 -0400
Organization:	University of Wisconsin - Milwaukee, Computing Services Division
References:	98-08-134 99-05-080
Keywords:	parse, LL(1), LR(1)

sergenth@tin.it writes:
>A grammar may be LL(k) if it satisfies some condition, other condition are
>needed for a grammar to be LR(k).
>It turns out that a grammar that is LL(k) also satisfies the conditions to
>be LR(k).
>
>For the languages defined by the grammars, you have that every language
>defined by a grammar LL(k) may also be defined by a grammar LR(1).

Actually, a context-free grammar is LL(k) (or LR(k)) if its
'canonical' extension to a simple syntax directed transduction (SSDT)
is LL(k) (or LR(k)). The canonical extension is the one that is
defined by an SSDT grammar which is formed by suffixing each rule of
the original context-free grammar by an distinct output symbol.

Note that the attributes LL(K) and LR(k) are actually attributes of
SSDT's, not of context-free languages (or CFL's). Two grammars
defining the same CFL may have completely different canonical
extensions to SSDT's. Indeed, the same grammar could have different
extensions to SSDT's if output symbols are placed at different points
within the rules, other than at the end. So the attributes are really
not even meaningful of context free grammars either.

This is both important and relevant because a parser is actually a a
processor for an SSDT, not a processor for a CFL. The formalism,
methodology and theory of parsing is based on the former not the
latter.

The more general theorems would be that:

(A) Every LL(k) SSDT is also LR(k)
(B) Every LL(k) SSDT is LR(1) (which I'm not sure is actually true)
(C) Every LR(k) SSDT reduces to an equivalent LR(1) SSDT (almost certainly
false).

A 'look-ahead' in an SSDT corresponds to commuting an input symbol with an
output symbol (yx -> xy, x is an input symbol, y an output symbol). This
corresponds to 'deferring the action y until after looking ahead to x'.
An SSDT is deterministic if it can be defined by a set of rules, all headed
by input symbols, with no two rules for a given non-terminal being headed
by the same input symbol. The k in LR(k) is probably directly related to
the maximum depth of transpositions that have to be made to render an SSDT
deterministic.

It'll get more complex than this, since you'll have to make algebraic
substitutions for those rules that are left-recursive in order to
eliminate all the left-recursions. This is what's done effectively and
systematically by the LR(0) construction with an end-marker.

If you're working directly with SSDT's (retaining the output symbols
explicitly in the rules) then all you need to do is the LR(0) construction.
>From this, by making yx->xy transpositions, you can systematically
increment the parser up to LR(k).

For example, the grammar

S -> A a | b A c | B c | b B a
A -> d
B -> d

has the following canonical extension, which is strictly LR(1) (not even
LALR(1)):

S -> A a y1 | b A c y2 | B c y3 | b B a y4
A -> d y5
B -> d y6

The example is easy enough to illustrate the transformation without
explicitly resorting to LR(0) construction. The grammar is equivalent to

S -> A a y1 | B c y3 | b T
T -> A c y2 | B a y4
A -> d y5
B -> d y6

If A and B are substituted for, the result is this:

S -> d y5 a y1 | d y6 c y3 | b T
T -> d y5 c y2 | d y6 a y4
A -> d y5
B -> d y6

which is equivalent to:

S -> d U | b T
T -> d V
U -> y5 a y1 | y6 c y3
V -> y5 c y2 | y6 a y4

This is basically what the LR(0) construction would give you (actually,
an order of magnitude simpler, but equal in 'determinicity')

One level of transpositions is required to make this deterministic:

S -> d U | b T
T -> d V
U -> a y5 y1 | c y6 y3
V -> c y5 y2 | a y6 y4

so the SSDT is LR(1). The significance is that you have to look ahead to
the symbol a or c to determine whether you need to perform (for rule U)
actions y5-y1 (corresponding to the reductions A->d and S -> Aa) or
actions y6-y3 (corresponding to the reductions B->d and S -> Bc).

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Types of grammars.

hunk@alpha1.csd.uwm.edu (Mark William Hopkins)2 Jun 1999 01:38:39 -0400

hunk@alpha1.csd.uwm.edu (Mark William Hopkins)
2 Jun 1999 01:38:39 -0400