Re: Question on regular expression matching

Henry Spencer <henry@zoo.toronto.edu>
18 Mar 1998 22:51:57 -0500

From comp.compilers

Related articles
Question on regular expression matching wolf@di.epfl.ch (1998-03-15)
*Re: Question on regular expression matching henry@zoo.toronto.edu (Henry Spencer)* (1998-03-18)**

| List of all articles for this month |

From:	Henry Spencer <henry@zoo.toronto.edu>
Newsgroups:	comp.compilers
Date:	18 Mar 1998 22:51:57 -0500
Organization:	SP Systems, Toronto, Canada
References:	98-03-126
Keywords:	lex

Thomas Wolf <wolf@di.epfl.ch> wrote:
>What is the supposed result of POSIX-style subexpression matches for
>ambiguous regular expressions? As an example, consider the RE
>"((a)(b|c)|(acab))*" matching the string "acab". What should the
>subexpression matches be in this case?

Within an expression, the POSIX spec says that subexpressions match the
longest substrings they can, and that earlier subexpressions have priority
over later ones. (The wording of the spec is admittedly less than ideal.)

Here, the earliest subexpression is the one that starts earliest: the one
in the outermost set of parentheses. It starts out by matching the
longest string it can, using its second branch. That gobbles up "acab".
Then it tries to match again because of the "*", but can't.

That having been settled... Just how that subexpression matches its
substring is sorted out the same way. The earlier subexpressions again
get priority, but that makes no difference in this case because the first
branch cannot match "acab" and the second must be used.

Earlier (or outer) subexpressions match as much as they can, and later
(or inner) ones do what they can within those constraints.

So match[0], [1], and [4] are all the whole string, and match[2] and [3]
are void since those subexpressions did not participate.

>...Would a match result reporting a
>null match of the "*" at the end of "acab" be possible, too?

Only if you started the match attempt there. POSIX REs always match
starting as early as they can.
--
| Henry Spencer
| henry@zoo.toronto.edu
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Question on regular expression matching

Henry Spencer <henry@zoo.toronto.edu>18 Mar 1998 22:51:57 -0500

Henry Spencer <henry@zoo.toronto.edu>
18 Mar 1998 22:51:57 -0500