Related articles |
---|
How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-05) |
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-05) |
Re: How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-06) |
Re: How to rewrite a regexp without word boundaries? haberg_20080406@math.su.se (Hans Aberg) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? andrew@tomazos.com (Andrew Tomazos) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? cfc@shell01.TheWorld.com (Chris F Clark) (2009-07-13) |
Re: How to rewrite a regexp without word boundaries? hu47121@usenet.kitty.sub.org (2009-08-16) |
[1 later articles] |
From: | Hallvard B Furuseth <h.b.furuseth@usit.uio.no> |
Newsgroups: | comp.compilers,comp.theory |
Date: | Sun, 05 Jul 2009 21:11:46 +0200 |
Organization: | University of Oslo, Norway |
References: | 09-07-003 |
Keywords: | lex, theory |
Posted-Date: | 05 Jul 2009 15:54:05 EDT |
dave_140390@hotmail.com writes:
> I have been wondering, with limited success, how to rewrite a regexp
> without word boundaries.
Why do you want to? Most likely, the answer is that your regexps are
getting too clever and thus too unreadable/bug-prone, so you should
break them up and use more ordinary programming instead.
However:
> (...)
> Thus, regexp "ex" matches "example" and "text", whereas regexp "\bex"
> matches "example" but not "text".
>
> Now, if "\b" occurs at the beginning (or end) of the regexp, I think
> it's easy to rewrite the regexp without using "\b". For example,
> "\bex" could be rewritten as "\Wex".
No, that would not match "ex" at the beginning of the string being
matched. You could use (?:^|\W)ex. But the matched substring (Perl's
$&) will differ from with \bex, so it depends on how the regexp is used.
> But what if "\b" occurs within the regexp? For example, how to get rid
> of "\b" in "<RE>\bex" (with "<RE>" being any regexp)? "<RE>\Wex"
> wouldn't work here: for example (with "<RE>" = "\W"), "\W\Wex" is not
> equivalent to "\W\bex".
Rewrite <RE> to a regexp which ends with \W or \W|^ and is equivalent
in the cases where it is followed by \b\w.
--
Hallvard
Return to the
comp.compilers page.
Search the
comp.compilers archives again.