Related articles |
---|
How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-05) |
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-05) |
Re: How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-06) |
Re: How to rewrite a regexp without word boundaries? haberg_20080406@math.su.se (Hans Aberg) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? andrew@tomazos.com (Andrew Tomazos) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? cfc@shell01.TheWorld.com (Chris F Clark) (2009-07-13) |
[2 later articles] |
From: | dave_140390@hotmail.com |
Newsgroups: | comp.compilers,comp.theory |
Date: | Sun, 5 Jul 2009 02:02:08 -0700 (PDT) |
Organization: | Compilers Central |
Keywords: | lex, theory, question |
Posted-Date: | 05 Jul 2009 09:29:57 EDT |
Hi,
I have been wondering, with limited success, how to rewrite a regexp
without word boundaries.
According to http://www.perl.com/doc/manual/html/pod/perlre.html:
<from Perl manual>
In addition, Perl defines the following:
\w Match a "word" character (alphanumeric plus "_")
\W Match a non-word character
[...]
Perl defines the following zero-width assertions:
\b Match a word boundary
[...]
A word boundary (\b) is defined as a spot between two characters
that has a \w on one side of it and a \W on the other side of it (in
either order), counting the imaginary characters off the beginning and
end of the string as matching a \W.
</from Perl manual>
Thus, regexp "ex" matches "example" and "text", whereas regexp "\bex"
matches "example" but not "text".
Now, if "\b" occurs at the beginning (or end) of the regexp, I think
it's easy to rewrite the regexp without using "\b". For example,
"\bex" could be rewritten as "\Wex".
But what if "\b" occurs within the regexp? For example, how to get rid
of "\b" in "<RE>\bex" (with "<RE>" being any regexp)? "<RE>\Wex"
wouldn't work here: for example (with "<RE>" = "\W"), "\W\Wex" is not
equivalent to "\W\bex".
So, is there an algorithm that transforms any regexp into an
equivalent regexp that contains no "\b"?
NOTE: In this message, by "regexp" I mean regular expressions in their
traditional meaning (= empty set, empty string, single symbol, union,
concatenation, Kleene star, and nothing else).
-- dave
Return to the
comp.compilers page.
Search the
comp.compilers archives again.