Re: Backtracking parsers

Michael Lee Finney <michael.finney@acm.org>
10 Jun 2000 15:10:19 -0400

          From comp.compilers

Related articles
Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-05-30)
Re: Backtracking parsers gneuner@dyn.com (2000-05-31)
Re: Backtracking parsers adrian@dcs.rhbnc.ac.uk (2000-06-01)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-01)
Re: Backtracking parsers gneuner@dyn.com (2000-06-06)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-09)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-10)
Re: Backtracking parsers cfc@world.std.com (Chris F Clark) (2000-06-14)
Re: Backtracking parsers gneuner@dyn.com (2000-06-20)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-20)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-20)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-30)
| List of all articles for this month |
From: Michael Lee Finney <michael.finney@acm.org>
Newsgroups: comp.compilers
Date: 10 Jun 2000 15:10:19 -0400
Organization: Lynchburg.net (lynchburg.net)
References: 00-05-111 00-05-119 00-06-015 00-06-032
Keywords: parse

gneuner@dyn.com says...
> I may be dense but I'm still missing something ... I don't see how
> inline substitution of the constituant clauses in Prod1 is going to
> make recognition context sensitive if the language itself isn't.
>
> Even if the language is context sensitive and selection of the proper
> alternative requires gestault knowledge independent of the ongoing
> parse, I'm still left with the question:
>
> How do you "continue" a parse of Prod3 that hasn't yet begun?


Now, to answer the first question, consider the following expression...


p1:= .*! '*' '/'


where . means match any character, * means match the previous subexpression
zero or more times and ! means to reverse the order of the match results --
essentially, in this case, it means to match non-greedily. So the
expression above will match any number of characters up to the first */
encountered, unless there is none in which case the match will fail. Now
consider the expressions...


p2:= p3 '/'
p3:= .*! '*'


Here what will happen is that when an attempt is made to match p2, it will
first attempt to match p3 which will match any number of characters up to
the first * and if p3 matches successfully then p2 will attempt to match a
/. This is not the same result as the first pattern. Consider the
string...


      aaaa*bbbb*/


Then pattern p1 will match the entire string because it will match to the
first * and then attempt to match a / but that will fail because it will
find a 'b'. This means that the pattern match will backtrack to the .*!
which will continue to the next alternative which is the second '*' and
then a / will be found so that the entire match will succeed. Please note
that I have simplified the picture a little, but hopefully the basic
actions are clear.


Now, consider what happens when an attempt is made to match p2. First it
attempts to match p3 which succeeds by matching to the first *. It then
attempts to match a / but that fails. That triggers a backtrack, but since
p2 doesn't contain any alternates the entire pattern match will fail.


Now, if instead we had...


p4:= p5 '/'
p5 = .*! '*'


where '=' is defining an inline pattern expression, the result of matching
p4 will be the same as when matching p1 because, after substituting the
inline p5, you effectively get....


p4:= (.*! '*') '/'


which is essentially the same as p1 (the extra parenthesis play no
significant role in this case).


Now, the original question was about how p2 should behave when failing to
match / caused a backtrack attempt. The conventional approach is as above
where p3 is completely discarded and so the match for p2 would fail. The
alternate suggestion is that backtracking should go into "p3" and backtrack
there.


The reason that the difference is one of context free vs. context sensitive
is that p3 will normally only match up to the first *. But, if the
surrounding context forced a backtrack into p3 then the second * would be
matched. So treating backtracking in this manner is a difference of
context free and context sensitive patterns where "closed" patterns are
context free and "open" (inline) patterns are context sensitive.


Thus, by providing both pattern expressions and pattern macros, one can get
either behaviour. This is important because sometimes long patterns are
broken into pieces for readability and organization. But, clearly,
breaking a pattern is not free from side effects. Thus allowing inline
patterns means that breaking larger patterns into two or more smaller
patterns can be done without changing the meaning of the original pattern.
This is not possible in conventional BNF (or E-BNF), or indeed, in most
other systems used to build complex grammars.





Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.