Re: POSIX regex subexpression matching semantics

vlaurika@cs.hut.fi (Ville Laurikari)
5 Apr 2000 22:18:43 -0400

          From comp.compilers

Related articles
POSIX regex subexpression matching semantics vlaurika@cs.hut.fi (2000-04-01)
Re: POSIX regex subexpression matching semantics monnier+comp/compilers/news/@flint.cs.yale.edu (Stefan Monnier) (2000-04-03)
Re: POSIX regex subexpression matching semantics vlaurika@cs.hut.fi (2000-04-05)
| List of all articles for this month |

From: vlaurika@cs.hut.fi (Ville Laurikari)
Newsgroups: comp.compilers
Date: 5 Apr 2000 22:18:43 -0400
Organization: Helsinki University of Technology, CS lab
References: 00-04-021 00-04-039
Keywords: DFA, lex

Stefan Monnier <monnier+comp/compilers/news/@flint.cs.yale.edu> wrote:
> I'd thus argue that when `(a*)*' is matched against `bc' the subexpression
> should not match at all (i.e. the outer repetition operator loops 0
> times rather than once).
> And more generally, if the body of a repetition operator can match the empty
> string, it should only match such an empty string if necessary (since
> it will always be possible). Matching the empty string is made necessary
> when there is a minimum number of matches as is the case with `+' and
> with `{n,m}'.


This seems reasonable.


> The rule that subexpressions also should match the longest match is
> often not properly followed by backtracking implementations (i.e.
> all implementations I can think of that allow the use of backreferences.
> I'm actually curious: has anyone tried to extend an NFA or DFA so
> as to allow matching backreferences ?).


There is GNU Rx (a.k.a. librx), which is supposed to comply with the
POSIX regex spec in every way, and seems to follow the longest match
rule quite well. There were still some bugs/misfeatures in version
1.8a when I tried it out, though.


--
http://www.iki.fi/vl/


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.