Re: Recursive descent and left recursion

schoebel@eicheinformatik.uni-stuttgart.de (Thomas Schoebel-Theuer)25 Jan 1997 21:59:35 -0500

From comp.compilers

Related articles
[3 earlier articles]
Re: Recursive descent and left recursion dlmoore@ix.netcom.com (David L Moore) (1997-01-16)
Re: Recursive descent and left recursion cfc@world.std.com (1997-01-16)
Re: Recursive descent and left recursion fjh@murlibobo.cs.mu.OZ.AU (1997-01-16)
Re: Recursive descent and left recursion cfc@world.std.com (1997-01-17)
Re: Recursive descent and left recursion will@ccs.neu.edu (William D Clinger) (1997-01-17)
Re: Recursive descent and left recursion cfc@world.std.com (Chris F Clark) (1997-01-21)
Re: Recursive descent and left recursion schoebel@eicheinformatik.uni-stuttgart.de (1997-01-25)
| List of all articles for this month |

 From: schoebel@eicheinformatik.uni-stuttgart.de (Thomas Schoebel-Theuer) Newsgroups: comp.compilers Date: 25 Jan 1997 21:59:35 -0500 Organization: Informatik, Uni Stuttgart, Germany References: 97-01-099 97-01-119 Keywords: parse

mfinney@lynchburg.net wrote:
|> However, I have been using recursive descent with left recursive
|> grammers for more than a decade. All it takes is the trivially
|> obvious check to allow the left recursion. Take, for example...
|>
|> (1) <exp> := <exp> + <term>
|> (2) <exp> := <term>
|>
|> when expanding (1), at the first term I simply check to see if the
|> current token in the input stream is the same as the last time that
|> the rewriting rule was expanded. If it is, then the parse has not
|> advanced and you have the infinite loop situation. I simply fail the
|> expansion and select an alternate rewriting rule for expansion.

Aehem, I think there is an error here: You try to get a
*deterministic* parser, i.e. a device avoiding backtracking from dead
ends, if I understood right. Your method of parse-time left recursion
elimination must not alter the accepted language, right? If you simply
fail one possible path, you cannot follow this path again if it should
be necessary later, by definiton of the "deterministicness". I see the
following problem: For example, the input <term> + <term> + <term>
cannot be parsed, because you have to choose the right number of
expansions of rule (1) *in advance* (i.e. at the first <term>) without
having seen the next tokens. At this point, you cannot know the number
of "+" that will appear later (if you don't have infinite lookahead).

A possible solution to this problem is what left corner parsers do
very regularly: you introduce a new instance of rule (1) *later* when
some additional "+" appears. This has a similar effect as transforming
the grammar to

(1') <exp> := <term> <expr'>
(2') <expr'> := + <expr>
(3') <expr'> := \epsilon

but continues to produce the *original* parse tree, i.e. you insert
the new instance of (1) "in the middle" of the already built parse
tree, und thus get the original left-recursive tree shape in contrast
to the right-recursive one of the modified grammar.

However, when you measure what is going on in terms of your original
grammar, the parse tree generation is either no longer top-down or no
longer deterministic, since you insert new rules "in the middle",
i.e. you *change* connections of your syntax tree later if it becomes
necessary. If you add inherited attributes, this may become a problem.

David L Moore <dlmoore@ix.netcom.com> writes:
|> At the risk of appearing thick, I do not see how this works. Surely,
|> given an expression like "1+2+3+4+5" I have to "pump" out four exp +
|> terms before I shift the first symbol (the 1) of the expression, so
|> the method suggested will not parse this expression.

Exactly, this is the problem.

|> What am I failing to grasp?

Nothing. Cycle detection alone is not sufficient. Leaving the normal
top-down depth-first evaluation order is only correct if all possible
paths can be (potentially) explored at all. So you can *rearrange*
(sort) the priority of evaluations in order to prevent infinite loops,
but you must not cut away any possible paths. However, changing the
order of evaluation may conflict with the LL(k) property: the
*deterministic parsing decision* must occur at the predictor step
(i.e. at the beginning of any rule), but you cannot decide the right
number of correct alternatives in your above example at this first
position.

So you cannot call a parser handling left-recursiveness LL(k) any
more. You can parse it deterministically (e.g. using LR(k)), but you
have to delay some parsing decicions after the predictor step. Note
that LR(k) is nothing but delaying the parsing decision until the
*end* of each rule, and exploring all possible paths occuring in the
meantime in parallel.

See my paper

@misc{Sch94,
author = {Thomas Sch\"obel-Theuer},
title = {Towards a Unified Theory of Context-Free Parsing},
howpublished = {presentation in: 1st ASMICS Workshop on Parsing Theory,
Dipartimento di Scienze dell'Informazione, Universita' degli Studi di Milano},
month = {13 October},
year = {1994}
}
@proceedings{ASMICS94,
editor = {Giovanni Pighizzini and Pierluigi San
Pietro},
title = {ASMICS Workshop on Parsing Theory},
organization = {ASMICS94, Rapporto Interno No 126-94, Dipartimento
di Scienze dell' Informazione, Universita' degli Studi di Milano},
month = {12--14 October},
year = {1994}
}

Greetings,

-- Thomas
--

Post a followup to this message