Fri, 29 Jan 1993 16:20:11 GMT

Related articles |
---|

Top-Down Parser Construction Conjectures bart@cs.uoregon.edu (1993-01-18) |

Re: Recursive descent parsing is better than LL(1) pjj@cs.man.ac.uk (Pete Jinks) (1993-01-27) |

Re: Recursive descent is better than LL(1) jan@klikspaan.si.hhs.nl (1993-01-29) |

Newsgroups: | comp.compilers |

From: | jan@klikspaan.si.hhs.nl |

Keywords: | LL(1), parse |

Organization: | Compilers Central |

References: | 93-01-122 93-01-204 |

Date: | Fri, 29 Jan 1993 16:20:11 GMT |

Barton Christopher Massey <bart@cs.uoregon.edu> writes:

*>I would like to prove or disprove the following conjectures:*

*> LL(1) Transformability: There exists an algorithm which, given any grammar G*

*> which is unambiguous and describes an LL(1) language L(G), produces an LL(1)*

*> grammar G' for the language L(G).*

*> LL(1) Decidability: There exists an algorithm which, given any grammar G*

*> which is unambiguous and describes a language L(G), decides whether L(G) is*

*> an LL(1) language.*

*>it seems most likely to me that LL1T is true, but I'm skeptical about LL1D.*

Bill Purvis <W.Purvis@daresbury.ac.uk> writes:

*>... SID... accepts a fairly arbitrary BNF description... produce an LL(1)*

*>grammar (assuming that the language is LL(1). - the first conjecture.*

*>...I suspect that in practical terms it probably covers the second conjecture.*

jan@klikspaan.si.hhs.nl writes:

*>I present a simple contra-example:*

*> S -> A | B*

*> A -> `x' A `y' | `a'*

*> B -> `x' B `z' | `b'*

*>...`SID may also report failure because*

*>its attempt to transform the grammar causes it to loop.'*

Pete Jinks <pjj@cs.man.ac.uk> writes:

*> Jan's example causes SID problems, so SID may establish LL1T but can't*

*> establish LL1D. I believe this is because his example is not LL(1), but*

*> no-one has said so explicitly so far and I can't find the example in the*

*> Dragon book - is it a well-known example?*

I'm teaching compiler construction on the school mentioned below. I have

been doing some spare time tinkering on the problem of transforming a

grammar to LL(1) form along the lines advocated by the `dragon book'.

That means it's done the same way as Foster's SID. There are some

differences though. My program accepts a `simple syntax directed

translation scheme' in the form of a `regular right part grammar'. The

translation scheme is expected to specify translation to postfixcode that

could easily be converted to an abstract syntax tree if you like. The

reason for this approach is that the transformation process messes up any

carefully designed grammar. The resultant abstract syntax trees however

remain invariant in the transformation process. So you can neglect the

messed up grammar. Any further translation can be defined in terms of an

attribute grammar for the resultant postfix code or equivalently the

abstract syntax trees.

I tested it mainly with two grammars: one derived from the Pascal-grammar

in `Jensen & Wirth - Pascal: User Manual and Report', the other derived

from James Roskind's YACC-grammar for C. On Pascal my program did a

marvelous job. On the Roskind-C grammar it runs into an infinite loop. (I

wasn't aware of the problem at the time.) So figuring out wat went wrong I

designed some small examples running into the same problem.

The problem is of course the restricted set of transformations used. After

elimination of direct and indirect recursion only two transformations

remain:

- substitution of leading nonterminals in rules with conflicting FIRST-sets.

- factorization of common prefixes of rules.

Even simpler grammars might get SID and my program into a loop when we

restrict ourselves to these transformations:

Example:

S -> A | B

A -> `x' A | `a'

B -> `x' B | `b'

Substituting A and B in the S rules and factorizing gives:

S -> `x' S1 | `a' | `b'

S1 -> A | B

A -> `x' A | `a'

B -> `x' B | `b'

So you run into the same loop.

Noting the rules of S1 match the original rules for S you might substitute

the name S for S1 to get:

S -> `x' S | `a' | `b'

That's a different transformation however! At least Foster doesn't mention

it, nor does the dragon book.

This transformation can be used sometimes. It works also on the next example:

S -> A | B

A -> `x' A `y' | `a'

B -> `x' B `y' | `b'

It doesn't work however on the contra-example from my previous posting:

S -> A | B

A -> `x' A `y' | `a'

B -> `x' B `z' | `b'

I chose this example in my previous posting while the other examples

couldn't disproof any of the conjectures. However it's no LL(1) language.

In fact the conjectures have not been disproved at all. Only if we

restrict our transformations to the set mentioned by Foster. I don't

expect the suggested transformation above solves the looping problem in

the general case for LL(1) languages, but I don't have any contra example.

The wider approach to recusive descent parsing as exposed in my previous

posting handles all the examples above. So why bother to much with strict

LL(1) languages. It's simple: anything that can be done with a table

driven LL(1) parser can also be done with a recursive descent parser. But

the converse is not true. Recursive descent can do much more than just

handling LL(1) languages.

Let's wind up this discussion:

Returning to Massey's second conjecture:

*> LL(1) Decidability: There exists an algorithm which, given any grammar G*

*> which is unambiguous and describes a language L(G), decides whether L(G) is*

*> an LL(1) language.*

LL(1) languages are only very weakly defined as languages for which an

LL(1) grammar exists. There's no description of an LL(1) language without

reference to an LL(1) grammar. `Dragon book' states, it after showing how

the table of a table driven parser is constructed, as `A grammar whose

parsing table has no multiply-defied entries is said to be LL(1).'.

So if you are lucky your search for an LL(1) grammar may succeed.

But how about proving a language defined by an arbitrary grammar is not

LL(1). Do we have to search an infinite universe of equivalent grammars?

After all we don't have any better characterization of being NOT LL(1)

than `there's no grammar for this language in our infinite universe of

equivalent grammars . . . etc.' This tends to an infinite loop. We need a

better characterization of being LL(1).

Popping our discussion stack again. What do we prefer? Education with

top-down or bottom-up techniques?

I present a bit of recursive descent parsing in a very early stage as part

of a course in datastructures and algorithms. This follows the course on

ordinairy Pascal and starts with pointers and recursion. I use

elementairy recursive descent technique based on syntax-diagrams as a

means to point out that recursion is'nt meant for stupid examples like

factorial functions, but instead a very powerfull programming technique.

As an exercise students have to read propositional expressions with

operators and or not <=> => parantheses and a few variables, have to

convert the expressions to abstract syntaxtrees, interprete the trees and

print a corresponding truth-table. Just to let them taste it.

In a later stage my collegues give seperate courses on the theory of

grammars, languages regular expressions etc. and on the usage of LEX and

YACC. Next a short course on theory of compiler construction containing

LL(1) and SLR(1) table construction, among other things. We end with a

practical assigment on building a translator with LEX and YACC for a

simplified version of Pascal (just integers and booleans, no type

constructions).

Concerning the choice between top-down and bottom-up my advice is: Take a

look at the language first. What did the inventors of the language have in

mind? Don't tell me they had no idea while defining the language.

Jan Schramp

--

Jan Schramp, Haagse Hogeschool - Sector Informatica. Louis Couperusplein 19,

2514 HP Den Haag, Netherlands. E-mail: jan@si.hhs.nl

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.