POSIX regex subexpression matching semantics

vlaurika@cs.hut.fi (Ville Laurikari)
1 Apr 2000 14:10:30 -0500

          From comp.compilers

Related articles
POSIX regex subexpression matching semantics vlaurika@cs.hut.fi (2000-04-01)
Re: POSIX regex subexpression matching semantics monnier+comp/compilers/news/@flint.cs.yale.edu (Stefan Monnier) (2000-04-03)
Re: POSIX regex subexpression matching semantics vlaurika@cs.hut.fi (2000-04-05)
| List of all articles for this month |

From: vlaurika@cs.hut.fi (Ville Laurikari)
Newsgroups: comp.compilers
Date: 1 Apr 2000 14:10:30 -0500
Organization: Helsinki University of Technology, CS lab
Keywords: lex


There's the POSIX 1003.2 spec and the leftmost longest match rule.
Fine. Then there's the implication (of the leftmost rule) that
higher-level subexpressions take priority over their lower-level
component subexpressions. OK. But, what does this mean when nested
repeats are used?

Take ((a)*)* and the string "aaaaa" as an example. For every "a"
read, the innermost "a" in the expression naturally matches. But,
should the inner (a)* consume the whole string, so that the outermost
repeat operator matches once? Or, should the inner (a)* consume only
one "a" at a time and let the outer repeat operator do the repeating
since the outer expression is prioritized over the inner expression?

How about (a|(a*b*))* and the string "aaaaabbbbbbbb" then? Should the
left operand of the union with the outermost repeat operator consume
the "a"'s from the string, and then let b* eat the rest? This would
be the leftmost match in some sense. Or is the correct way to let the
right operand of the union eat the whole string?

The GNU regex(7) man page says that when (a*)* is matched against
"bc" both the whole RE _and_ the parenthesized subexpression match the
empty string. If this is so, when matched against "aa" shouldn't the
whole RE match the whole string first _and then_ the empty string at
the end?


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.