22 Dec 1995 11:14:18 -0500

Related articles |
---|

LR-grammar differences weberwu@tfh-berlin.de (1995-12-22) |

From: | weberwu@tfh-berlin.de (Debora Weber-Wulff) |

Newsgroups: | comp.compilers |

Date: | 22 Dec 1995 11:14:18 -0500 |

Organization: | TFH-Berlin (Berlin, Germany) |

Keywords: | LR(1), parse |

The LL vs. LR thread has been helpful, but I still want to understand

the differences between the various LR schemes.

I'm trying to get the differences between the LR-parser types sorted

out in my head and fit onto one sheet of paper. After perusing a

number of books and papers: Aho, ([Sethi,] Ullman) + Johnson; Sippu,

Soisalon-Soininen; Mayer; DeRemer; Knuth; Pager; Bermudez; Geller,

Harrison; I am a tad confused. Each uses their own sort of definitions

and I am not real sure of the differences. This is what I hope I've

understood:

canonical LR(k) (k > 0) grammars:

A parsing table can be created from such grammars that pre-compute the

k-lookahead into the states by a magic process. This makes life easy

on the parser, which doesn't have to worry about lookahead or lookback

schemes, but tends to have "enormous"-sized tables (although in these

days since Microsoft has conditioned us to accept bloated software and

our machines have a good few MB on board, the spectre of a 20000 entry

table doesn't really scare me - should it?)

canonical LR(0) grammars:

Nice 'n easy to make the table here because there's no lookahead. One

constructs a non-deterministic finite state automaton with the items

constructed from the productions as the states and appropriate

transitions. Then an application of the Myhill construction (closure)

squeezes an equivalent deterministic automaton out of the

NDFSA. Unfortunately, interesting constructs for programming languages

tend to provoke reduce-reduce or shift-reduce conflicts, rendering

this type of table useless.

Simple LR(1) grammars:

Takes the canonical LR(0) table and slaps on an easy to calculate

FOLLOW set to unravel the conflicts based on the next character in the

input (lookahead of 1). Unfortunately, it doesn't have enough

information about left context. It can almost do most interesting

programming language constructs, but differentiating procedure calls

from identifiers in the last position of an expression in a language

with statement terminators is impossible because the FOLLOW set is too

big.

LALR(1) grammars:

For these grammars we can save the day by calculating a bit more

refined FOLLOW. This will cover just about everything we need, which

is why yacc likes LALR(1) grammars. The only problem is with errors -

an error may not be detected at the earliest possible point (i.e. the

point at which the canonical LR(1) table would scream), a sequence of

reductions may take place. However, since no shifting will take place

it doesn't really matter. There are some grammars, however, that while

being LR(1) are not LALR(1) because the funny merging operation will

introduce a conflict.

Is this correct? This ignores all the compaction schemes I've only

ever found mentioned in the developers papers... Maybe I shouldn't

bother and just teach LL?

Does anyone have a pointer to reports of use on modern compiler-

compilers? What is cheap and easy for beginners to understand? I'm

teaching compiler construction for the first time and I can't stomach

teaching yacc...

--

Debora Weber-Wulff (Professorin fuer Softwaretechnik und Programmiersprachen)

Technische Fachhochschule Berlin, FB Informatik, Luxemburger Str. 10,

13353 Berlin, Germany email: weberwu@tfh-berlin.de

<http://www.tfh-berlin.de/usr1/name/weberwu/public_html/>

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.