Re: Backtracking parsers

Michael Lee Finney <michael.finney@acm.org>
20 Jun 2000 02:30:18 -0400

          From comp.compilers

Related articles
[3 earlier articles]
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-01)
Re: Backtracking parsers gneuner@dyn.com (2000-06-06)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-09)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-10)
Re: Backtracking parsers cfc@world.std.com (Chris F Clark) (2000-06-14)
Re: Backtracking parsers gneuner@dyn.com (2000-06-20)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-20)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-20)
Re: Backtracking parsers michael.finney@acm.org (Michael Lee Finney) (2000-06-30)
| List of all articles for this month |

From: Michael Lee Finney <michael.finney@acm.org>
Newsgroups: comp.compilers
Date: 20 Jun 2000 02:30:18 -0400
Organization: Lynchburg.net (lynchburg.net)
References: 00-05-111 00-05-119 00-06-015 00-06-032 00-06-037 00-06-050
Keywords: parse

Chris Clark cfc@world.std.com says...
> Michael Lee Finney <michael.finney@acm.org> wrote a fascinating article on
> inline substitution, in which he talked about backtracking in the following
> case:
>
> > p1:= .*! '*' '/'
>
> I'm not quite certain how your ! operator works. More examples might
> help enlighten me. It looks like a very useful regexp extension. (Is
> it somehow related to the prolog "cut" operator or something from
> snobol or icon? Both my prolog and snobol are so rusty I can't
> remember anything but the most general nature of them.)


Some rexexp would write the above as .*? but the ! operator is a bit
more general than that. In pattern theory, you have a "cursor" which
is a pair <s,I> where s is a string of symbols over some alphabet and
I is a pointer to locations between symbols and would thus range from
0 to n, where n is the number of symbols in a string. A pattern is
then a function which accepts a cursor and returns a sequence of
cursors. The elements of that sequence are alternates. So, .* is a
pattern which would return [n, n-1, ..., 1, 0] and .*! would then
return [0,1,...,n-1,n]. Essentially, ! reverses the returned
sequence. In terms of .* and regexp, .* is, in most implementations,
"greedy" and so returns the first sequence of cursor. And .*? is
"non-greedy" and returns the second sequence. I have defined ! to
reverse the returned sequence, so that !! has no effect on the
returned sequence.


This is not the same as the "cut" operator or the "fence" operator in
SNOBOL4. The purpose of "fence" is to cause the current pattern match
to fail if backtracking backs into the fence operator.


> However, there is another approach to controlling backtracking that
> semes to cover the same ground as your inline macro substitution.
> It is syntactic predicates.


In reading the explanation of syntactic predicates you gave (and which
I snipped for brevity), it appears as if a syntactic predicate is
simply a positive lookahead in terms of a regexp. I believe that one
notation in use is (?= regexp) which will match 0 characters only if
regexp matches at the current position. There is also negative
lookahead as in (?! regexp) which will match zero characters only if
regexp does not match at the current position. From your description,
it appears as if the semantic predicates are limited to positive
lookahead and do not support negative lookahead. If I have
misunderstood your explanation, please let me know.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.