Re: Backtracking parsers

Michael Lee Finney <michael.finney@acm.org>
1 Jun 2000 18:05:50 -0400

          From comp.compilers

Related articles
Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-05-30)
Re: Backtracking parsers gneuner@dyn.com (2000-05-31)
Re: Backtracking parsers adrian@dcs.rhbnc.ac.uk (2000-06-01)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-01)
Re: Backtracking parsers gneuner@dyn.com (2000-06-06)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-09)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-10)
Re: Backtracking parsers cfc@world.std.com (Chris F Clark) (2000-06-14)
Re: Backtracking parsers gneuner@dyn.com (2000-06-20)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-20)
[2 later articles]
| List of all articles for this month |

From: Michael Lee Finney <michael.finney@acm.org>
Newsgroups: comp.compilers
Date: 1 Jun 2000 18:05:50 -0400
Organization: Compilers Central
References: 00-05-111 00-05-119
Keywords: parse

gneuner@dyn.com says...
> > prod1 := (prod2 | prod3) prod4
> > prod2 := ...
> > prod3 := ...
> > prod4 := ...


> >If you are parsing prod1, have recognized prod2 and then fail to
> >recognize prod4, the usual method of applying backtracking would be to
> >discard the prod2 match and attempt to match prod3 followed by prod4.
> >As far as I know this is the way that backtracking is always done.


> >However, he suggested that what should happen is that, supposing prod3
> >contained alternatives, backtracking would occur at the bottom most,
> >right portion of prod3 with successive backtracks proceeding up the
> >parse tree for prod3.


> No insight, but an observation.


> If prod2 is itself a prefix to one of prod3's alternatives, it would
> make some sense to avoid reevaluating it when possible, but your
> example didn't specify this.


> Aside from the prefix case, both branches of


> prod1 := (prod2 prod4) | (prod3 prod4)


> would have to have been [at least] partially evaluated so that there
> is some prod3 state to be continued


Quite true. However, that is primary an efficiency issue. The issue
is that the two different methods can result in the recognition of
different languages.


The usual method appears to be based on the assumption that the
grammar defines a context free language where backtracking by entirely
discarding prod2 is exactly appropriate. The alternate approach would
be context sensitive, but not in the usual sense according to the
language classifications. Even for context sensitive grammars,
backtracking can be done in the usual manner, the only difference is
how productions are selected for replacement. There doesn't seem to
be much about the most general form of grammar which has no
restrictions on rewriting productions. Perhaps somebody knows of a
good reference that breaks that down a bit (outside of the usual
transformational grammars, attributed grammars, etc.)?


A different way of looking at it is the difference between open and
closed productions. The usual method would result from assuming that
each production is completely independent and completely encapsulated.
The alternate method would result from assuming that (at least some)
productions are macros or inline. That is actually a fairly
reasonable thing to do when you want to break a production up because
of its length, but then it could fail to recognize the grammar if you
break it up using the usual, closed productions.





Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.