|POSIX regex subexpression matching semantics email@example.com (2000-04-01)|
|Re: POSIX regex subexpression matching semantics firstname.lastname@example.org (Stefan Monnier) (2000-04-03)|
|Re: POSIX regex subexpression matching semantics email@example.com (2000-04-05)|
|From:||firstname.lastname@example.org (Ville Laurikari)|
|Date:||1 Apr 2000 14:10:30 -0500|
|Organization:||Helsinki University of Technology, CS lab|
There's the POSIX 1003.2 spec and the leftmost longest match rule.
Fine. Then there's the implication (of the leftmost rule) that
higher-level subexpressions take priority over their lower-level
component subexpressions. OK. But, what does this mean when nested
repeats are used?
Take ((a)*)* and the string "aaaaa" as an example. For every "a"
read, the innermost "a" in the expression naturally matches. But,
should the inner (a)* consume the whole string, so that the outermost
repeat operator matches once? Or, should the inner (a)* consume only
one "a" at a time and let the outer repeat operator do the repeating
since the outer expression is prioritized over the inner expression?
How about (a|(a*b*))* and the string "aaaaabbbbbbbb" then? Should the
left operand of the union with the outermost repeat operator consume
the "a"'s from the string, and then let b* eat the rest? This would
be the leftmost match in some sense. Or is the correct way to let the
right operand of the union eat the whole string?
The GNU regex(7) man page says that when (a*)* is matched against
"bc" both the whole RE _and_ the parenthesized subexpression match the
empty string. If this is so, when matched against "aa" shouldn't the
whole RE match the whole string first _and then_ the empty string at
Return to the
Search the comp.compilers archives again.