Re: How to rewrite a regexp without word boundaries?

Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Sun, 05 Jul 2009 21:11:46 +0200

          From comp.compilers

Related articles
How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-05)
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-05)
Re: How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-06)
Re: How to rewrite a regexp without word boundaries? haberg_20080406@math.su.se (Hans Aberg) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? andrew@tomazos.com (Andrew Tomazos) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? cfc@shell01.TheWorld.com (Chris F Clark) (2009-07-13)
Re: How to rewrite a regexp without word boundaries? hu47121@usenet.kitty.sub.org (2009-08-16)
[1 later articles]
| List of all articles for this month |
From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Newsgroups: comp.compilers,comp.theory
Date: Sun, 05 Jul 2009 21:11:46 +0200
Organization: University of Oslo, Norway
References: 09-07-003
Keywords: lex, theory
Posted-Date: 05 Jul 2009 15:54:05 EDT

dave_140390@hotmail.com writes:


> I have been wondering, with limited success, how to rewrite a regexp
> without word boundaries.


Why do you want to? Most likely, the answer is that your regexps are
getting too clever and thus too unreadable/bug-prone, so you should
break them up and use more ordinary programming instead.


However:


> (...)
> Thus, regexp "ex" matches "example" and "text", whereas regexp "\bex"
> matches "example" but not "text".
>
> Now, if "\b" occurs at the beginning (or end) of the regexp, I think
> it's easy to rewrite the regexp without using "\b". For example,
> "\bex" could be rewritten as "\Wex".


No, that would not match "ex" at the beginning of the string being
matched. You could use (?:^|\W)ex. But the matched substring (Perl's
$&) will differ from with \bex, so it depends on how the regexp is used.


> But what if "\b" occurs within the regexp? For example, how to get rid
> of "\b" in "<RE>\bex" (with "<RE>" being any regexp)? "<RE>\Wex"
> wouldn't work here: for example (with "<RE>" = "\W"), "\W\Wex" is not
> equivalent to "\W\bex".


Rewrite <RE> to a regexp which ends with \W or \W|^ and is equivalent
in the cases where it is followed by \b\w.


--
Hallvard


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.