Question on regular expression matching

wolf@di.epfl.ch (Thomas Wolf)
15 Mar 1998 00:18:43 -0500

From comp.compilers

Related articles
*Question on regular expression matching wolf@di.epfl.ch* (1998-03-15)**
Re: Question on regular expression matching henry@zoo.toronto.edu (Henry Spencer) (1998-03-18)

| List of all articles for this month |

From:	wolf@di.epfl.ch (Thomas Wolf)
Newsgroups:	comp.compilers
Date:	15 Mar 1998 00:18:43 -0500
Organization:	Compilers Central
Keywords:	lex, question

What is the supposed result of POSIX-style subexpression matches for
ambiguous regular expressions? As an example, consider the RE
"((a)(b|c)|(acab))*" matching the string "acab". What should the
subexpression matches be in this case?

I would expect either

    match[0] = {0, 4} The whole RE
    match[1] = {0, 4} The "*"
    match[2] = {-1, -1} Yes, "(a)" matched at pos. 2, but we took the
                                            other branch of the "|".
    match[3] = {-1, -1} Again, we took the other branch ('cause it's
                                            longer).
    match[4] = {0, 4} "(acab)" matched the whole string.

or

    match[0] = {0, 4} The whole RE
    match[1] = {2, 4} The last time the "*" is considered
    match[2] = {2, 3} Match of "(a)" in the last iteration of the "*"
    match[3] = {3, 4} Match of "(b|c)" in the last iteration
    match[4] = {-1, -1} Although it matches the whole input, it doesn't
                                            do so on the last iteration.

Which of these is it, and why? If it's the second one, would the
result change if the RE was written "((acab)|(a)(b|c))*" (besides
the different subexpression numbering, of course)? And if it's
neither, where did I go wrong? Would a match result reporting a
null match of the "*" at the end of "acab" be possible, too?

Personally, I think it should be the first one, for otherwise matching
the RE "((a)(b|c)|(acax))*" against the string "acax" would report a
full match, but claim that none of the subexpressions matched!?
--
Thomas Wolf
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Question on regular expression matching

wolf@di.epfl.ch (Thomas Wolf)15 Mar 1998 00:18:43 -0500

wolf@di.epfl.ch (Thomas Wolf)
15 Mar 1998 00:18:43 -0500