Re: How to rewrite a regexp without word boundaries?

Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Sun, 05 Jul 2009 21:11:46 +0200

From comp.compilers

Related articles
How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-05)
*Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth)* (2009-07-05)**
Re: How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-06)
Re: How to rewrite a regexp without word boundaries? haberg_20080406@math.su.se (Hans Aberg) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? andrew@tomazos.com (Andrew Tomazos) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? cfc@shell01.TheWorld.com (Chris F Clark) (2009-07-13)
Re: How to rewrite a regexp without word boundaries? hu47121@usenet.kitty.sub.org (2009-08-16)
[1 later articles]

| List of all articles for this month |

From:	Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Newsgroups:	comp.compilers,comp.theory
Date:	Sun, 05 Jul 2009 21:11:46 +0200
Organization:	University of Oslo, Norway
References:	09-07-003
Keywords:	lex, theory
Posted-Date:	05 Jul 2009 15:54:05 EDT

dave_140390@hotmail.com writes:

> I have been wondering, with limited success, how to rewrite a regexp
> without word boundaries.

Why do you want to? Most likely, the answer is that your regexps are
getting too clever and thus too unreadable/bug-prone, so you should
break them up and use more ordinary programming instead.

However:

> (...)
> Thus, regexp "ex" matches "example" and "text", whereas regexp "\bex"
> matches "example" but not "text".
>
> Now, if "\b" occurs at the beginning (or end) of the regexp, I think
> it's easy to rewrite the regexp without using "\b". For example,
> "\bex" could be rewritten as "\Wex".

No, that would not match "ex" at the beginning of the string being
matched. You could use (?:^|\W)ex. But the matched substring (Perl's
$&) will differ from with \bex, so it depends on how the regexp is used.

> But what if "\b" occurs within the regexp? For example, how to get rid
> of "\b" in "<RE>\bex" (with "<RE>" being any regexp)? "<RE>\Wex"
> wouldn't work here: for example (with "<RE>" = "\W"), "\W\Wex" is not
> equivalent to "\W\bex".

Rewrite <RE> to a regexp which ends with \W or \W|^ and is equivalent
in the cases where it is followed by \b\w.

--
Hallvard

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: How to rewrite a regexp without word boundaries?

Hallvard B Furuseth <h.b.furuseth@usit.uio.no>Sun, 05 Jul 2009 21:11:46 +0200

Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Sun, 05 Jul 2009 21:11:46 +0200