2 Jun 1999 01:38:39 -0400

Related articles |
---|

Types of grammars. Luke@comodo.techi.com.force9.net (Luke) (1998-08-19) |

Re: Types of grammars. cfc@world.std.com (Chris F Clark) (1998-08-19) |

Re: Types of grammars. cfc@world.std.com (Chris F Clark) (1998-08-20) |

Re: Types of grammars. adrian@dcs.rhbnc.ac.uk (1998-08-24) |

Re: Types of grammars. sergenth@tin.it (1999-05-20) |

Re: Types of grammars. hunk@alpha1.csd.uwm.edu (1999-06-02) |

From: | hunk@alpha1.csd.uwm.edu (Mark William Hopkins) |

Newsgroups: | comp.compilers |

Date: | 2 Jun 1999 01:38:39 -0400 |

Organization: | University of Wisconsin - Milwaukee, Computing Services Division |

References: | 98-08-134 99-05-080 |

Keywords: | parse, LL(1), LR(1) |

sergenth@tin.it writes:

*>A grammar may be LL(k) if it satisfies some condition, other condition are*

*>needed for a grammar to be LR(k).*

*>It turns out that a grammar that is LL(k) also satisfies the conditions to*

*>be LR(k).*

*>*

*>For the languages defined by the grammars, you have that every language*

*>defined by a grammar LL(k) may also be defined by a grammar LR(1).*

Actually, a context-free grammar is LL(k) (or LR(k)) if its

'canonical' extension to a simple syntax directed transduction (SSDT)

is LL(k) (or LR(k)). The canonical extension is the one that is

defined by an SSDT grammar which is formed by suffixing each rule of

the original context-free grammar by an distinct output symbol.

Note that the attributes LL(K) and LR(k) are actually attributes of

SSDT's, not of context-free languages (or CFL's). Two grammars

defining the same CFL may have completely different canonical

extensions to SSDT's. Indeed, the same grammar could have different

extensions to SSDT's if output symbols are placed at different points

within the rules, other than at the end. So the attributes are really

not even meaningful of context free grammars either.

This is both important and relevant because a parser is actually a a

processor for an SSDT, not a processor for a CFL. The formalism,

methodology and theory of parsing is based on the former not the

latter.

The more general theorems would be that:

(A) Every LL(k) SSDT is also LR(k)

(B) Every LL(k) SSDT is LR(1) (which I'm not sure is actually true)

(C) Every LR(k) SSDT reduces to an equivalent LR(1) SSDT (almost certainly

false).

A 'look-ahead' in an SSDT corresponds to commuting an input symbol with an

output symbol (yx -> xy, x is an input symbol, y an output symbol). This

corresponds to 'deferring the action y until after looking ahead to x'.

An SSDT is deterministic if it can be defined by a set of rules, all headed

by input symbols, with no two rules for a given non-terminal being headed

by the same input symbol. The k in LR(k) is probably directly related to

the maximum depth of transpositions that have to be made to render an SSDT

deterministic.

It'll get more complex than this, since you'll have to make algebraic

substitutions for those rules that are left-recursive in order to

eliminate all the left-recursions. This is what's done effectively and

systematically by the LR(0) construction with an end-marker.

If you're working directly with SSDT's (retaining the output symbols

explicitly in the rules) then all you need to do is the LR(0) construction.

*>From this, by making yx->xy transpositions, you can systematically*

increment the parser up to LR(k).

For example, the grammar

S -> A a | b A c | B c | b B a

A -> d

B -> d

has the following canonical extension, which is strictly LR(1) (not even

LALR(1)):

S -> A a y1 | b A c y2 | B c y3 | b B a y4

A -> d y5

B -> d y6

The example is easy enough to illustrate the transformation without

explicitly resorting to LR(0) construction. The grammar is equivalent to

S -> A a y1 | B c y3 | b T

T -> A c y2 | B a y4

A -> d y5

B -> d y6

If A and B are substituted for, the result is this:

S -> d y5 a y1 | d y6 c y3 | b T

T -> d y5 c y2 | d y6 a y4

A -> d y5

B -> d y6

which is equivalent to:

S -> d U | b T

T -> d V

U -> y5 a y1 | y6 c y3

V -> y5 c y2 | y6 a y4

This is basically what the LR(0) construction would give you (actually,

an order of magnitude simpler, but equal in 'determinicity')

One level of transpositions is required to make this deterministic:

S -> d U | b T

T -> d V

U -> a y5 y1 | c y6 y3

V -> c y5 y2 | a y6 y4

so the SSDT is LR(1). The significance is that you have to look ahead to

the symbol a or c to determine whether you need to perform (for rule U)

actions y5-y1 (corresponding to the reductions A->d and S -> Aa) or

actions y6-y3 (corresponding to the reductions B->d and S -> Bc).

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.