Re: Recursive descent is better than LL(1)

jan@klikspaan.si.hhs.nl
Fri, 29 Jan 1993 16:20:11 GMT

          From comp.compilers

Related articles
Top-Down Parser Construction Conjectures bart@cs.uoregon.edu (1993-01-18)
Re: Recursive descent parsing is better than LL(1) pjj@cs.man.ac.uk (Pete Jinks) (1993-01-27)
Re: Recursive descent is better than LL(1) jan@klikspaan.si.hhs.nl (1993-01-29)
| List of all articles for this month |

Newsgroups: comp.compilers
From: jan@klikspaan.si.hhs.nl
Keywords: LL(1), parse
Organization: Compilers Central
References: 93-01-122 93-01-204
Date: Fri, 29 Jan 1993 16:20:11 GMT

Barton Christopher Massey <bart@cs.uoregon.edu> writes:
>I would like to prove or disprove the following conjectures:
> LL(1) Transformability: There exists an algorithm which, given any grammar G
> which is unambiguous and describes an LL(1) language L(G), produces an LL(1)
> grammar G' for the language L(G).
> LL(1) Decidability: There exists an algorithm which, given any grammar G
> which is unambiguous and describes a language L(G), decides whether L(G) is
> an LL(1) language.
>it seems most likely to me that LL1T is true, but I'm skeptical about LL1D.


Bill Purvis <W.Purvis@daresbury.ac.uk> writes:
>... SID... accepts a fairly arbitrary BNF description... produce an LL(1)
>grammar (assuming that the language is LL(1). - the first conjecture.
>...I suspect that in practical terms it probably covers the second conjecture.
jan@klikspaan.si.hhs.nl writes:
>I present a simple contra-example:
> S -> A | B
> A -> `x' A `y' | `a'
> B -> `x' B `z' | `b'
>...`SID may also report failure because
>its attempt to transform the grammar causes it to loop.'


Pete Jinks <pjj@cs.man.ac.uk> writes:
> Jan's example causes SID problems, so SID may establish LL1T but can't
> establish LL1D. I believe this is because his example is not LL(1), but
> no-one has said so explicitly so far and I can't find the example in the
> Dragon book - is it a well-known example?


I'm teaching compiler construction on the school mentioned below. I have
been doing some spare time tinkering on the problem of transforming a
grammar to LL(1) form along the lines advocated by the `dragon book'.
That means it's done the same way as Foster's SID. There are some
differences though. My program accepts a `simple syntax directed
translation scheme' in the form of a `regular right part grammar'. The
translation scheme is expected to specify translation to postfixcode that
could easily be converted to an abstract syntax tree if you like. The
reason for this approach is that the transformation process messes up any
carefully designed grammar. The resultant abstract syntax trees however
remain invariant in the transformation process. So you can neglect the
messed up grammar. Any further translation can be defined in terms of an
attribute grammar for the resultant postfix code or equivalently the
abstract syntax trees.


I tested it mainly with two grammars: one derived from the Pascal-grammar
in `Jensen & Wirth - Pascal: User Manual and Report', the other derived
from James Roskind's YACC-grammar for C. On Pascal my program did a
marvelous job. On the Roskind-C grammar it runs into an infinite loop. (I
wasn't aware of the problem at the time.) So figuring out wat went wrong I
designed some small examples running into the same problem.


The problem is of course the restricted set of transformations used. After
elimination of direct and indirect recursion only two transformations
remain:
- substitution of leading nonterminals in rules with conflicting FIRST-sets.
- factorization of common prefixes of rules.


Even simpler grammars might get SID and my program into a loop when we
restrict ourselves to these transformations:


Example:
S -> A | B
A -> `x' A | `a'
B -> `x' B | `b'


Substituting A and B in the S rules and factorizing gives:


S -> `x' S1 | `a' | `b'
S1 -> A | B
A -> `x' A | `a'
B -> `x' B | `b'


So you run into the same loop.


Noting the rules of S1 match the original rules for S you might substitute
the name S for S1 to get:


S -> `x' S | `a' | `b'


That's a different transformation however! At least Foster doesn't mention
it, nor does the dragon book.


This transformation can be used sometimes. It works also on the next example:


S -> A | B
A -> `x' A `y' | `a'
B -> `x' B `y' | `b'


It doesn't work however on the contra-example from my previous posting:


S -> A | B
A -> `x' A `y' | `a'
B -> `x' B `z' | `b'


I chose this example in my previous posting while the other examples
couldn't disproof any of the conjectures. However it's no LL(1) language.


In fact the conjectures have not been disproved at all. Only if we
restrict our transformations to the set mentioned by Foster. I don't
expect the suggested transformation above solves the looping problem in
the general case for LL(1) languages, but I don't have any contra example.


The wider approach to recusive descent parsing as exposed in my previous
posting handles all the examples above. So why bother to much with strict
LL(1) languages. It's simple: anything that can be done with a table
driven LL(1) parser can also be done with a recursive descent parser. But
the converse is not true. Recursive descent can do much more than just
handling LL(1) languages.


Let's wind up this discussion:


Returning to Massey's second conjecture:


> LL(1) Decidability: There exists an algorithm which, given any grammar G
> which is unambiguous and describes a language L(G), decides whether L(G) is
> an LL(1) language.


LL(1) languages are only very weakly defined as languages for which an
LL(1) grammar exists. There's no description of an LL(1) language without
reference to an LL(1) grammar. `Dragon book' states, it after showing how
the table of a table driven parser is constructed, as `A grammar whose
parsing table has no multiply-defied entries is said to be LL(1).'.


So if you are lucky your search for an LL(1) grammar may succeed.


But how about proving a language defined by an arbitrary grammar is not
LL(1). Do we have to search an infinite universe of equivalent grammars?
After all we don't have any better characterization of being NOT LL(1)
than `there's no grammar for this language in our infinite universe of
equivalent grammars . . . etc.' This tends to an infinite loop. We need a
better characterization of being LL(1).


Popping our discussion stack again. What do we prefer? Education with
top-down or bottom-up techniques?


I present a bit of recursive descent parsing in a very early stage as part
of a course in datastructures and algorithms. This follows the course on
ordinairy Pascal and starts with pointers and recursion. I use
elementairy recursive descent technique based on syntax-diagrams as a
means to point out that recursion is'nt meant for stupid examples like
factorial functions, but instead a very powerfull programming technique.
As an exercise students have to read propositional expressions with
operators and or not <=> => parantheses and a few variables, have to
convert the expressions to abstract syntaxtrees, interprete the trees and
print a corresponding truth-table. Just to let them taste it.


In a later stage my collegues give seperate courses on the theory of
grammars, languages regular expressions etc. and on the usage of LEX and
YACC. Next a short course on theory of compiler construction containing
LL(1) and SLR(1) table construction, among other things. We end with a
practical assigment on building a translator with LEX and YACC for a
simplified version of Pascal (just integers and booleans, no type
constructions).


Concerning the choice between top-down and bottom-up my advice is: Take a
look at the language first. What did the inventors of the language have in
mind? Don't tell me they had no idea while defining the language.


Jan Schramp


--
Jan Schramp, Haagse Hogeschool - Sector Informatica. Louis Couperusplein 19,
2514 HP Den Haag, Netherlands. E-mail: jan@si.hhs.nl
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.