Related articles |
---|
Question on regular expression matching wolf@di.epfl.ch (1998-03-15) |
Re: Question on regular expression matching henry@zoo.toronto.edu (Henry Spencer) (1998-03-18) |
From: | wolf@di.epfl.ch (Thomas Wolf) |
Newsgroups: | comp.compilers |
Date: | 15 Mar 1998 00:18:43 -0500 |
Organization: | Compilers Central |
Keywords: | lex, question |
What is the supposed result of POSIX-style subexpression matches for
ambiguous regular expressions? As an example, consider the RE
"((a)(b|c)|(acab))*" matching the string "acab". What should the
subexpression matches be in this case?
I would expect either
match[0] = {0, 4} The whole RE
match[1] = {0, 4} The "*"
match[2] = {-1, -1} Yes, "(a)" matched at pos. 2, but we took the
other branch of the "|".
match[3] = {-1, -1} Again, we took the other branch ('cause it's
longer).
match[4] = {0, 4} "(acab)" matched the whole string.
or
match[0] = {0, 4} The whole RE
match[1] = {2, 4} The last time the "*" is considered
match[2] = {2, 3} Match of "(a)" in the last iteration of the "*"
match[3] = {3, 4} Match of "(b|c)" in the last iteration
match[4] = {-1, -1} Although it matches the whole input, it doesn't
do so on the last iteration.
Which of these is it, and why? If it's the second one, would the
result change if the RE was written "((acab)|(a)(b|c))*" (besides
the different subexpression numbering, of course)? And if it's
neither, where did I go wrong? Would a match result reporting a
null match of the "*" at the end of "acab" be possible, too?
Personally, I think it should be the first one, for otherwise matching
the RE "((a)(b|c)|(acax))*" against the string "acax" would report a
full match, but claim that none of the subexpressions matched!?
--
Thomas Wolf
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.