Related articles |
---|
How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-05) |
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-05) |
Re: How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-06) |
Re: How to rewrite a regexp without word boundaries? haberg_20080406@math.su.se (Hans Aberg) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? andrew@tomazos.com (Andrew Tomazos) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? cfc@shell01.TheWorld.com (Chris F Clark) (2009-07-13) |
Re: How to rewrite a regexp without word boundaries? hu47121@usenet.kitty.sub.org (2009-08-16) |
Re: How to rewrite a regexp without word boundaries? dot@dotat.at (Tony Finch) (2009-08-16) |
From: | dave_140390@hotmail.com |
Newsgroups: | comp.compilers,comp.theory |
Date: | Mon, 6 Jul 2009 13:43:54 -0700 (PDT) |
Organization: | Compilers Central |
References: | 09-07-003 09-07-004 |
Keywords: | lex, DFA |
Posted-Date: | 06 Jul 2009 17:22:33 EDT |
> > I have been wondering, with limited success, how to rewrite a regexp
> > without word boundaries.
>
> Why do you want to? Most likely, the answer is that your regexps are
> getting too clever and thus too unreadable/bug-prone, so you should
> break them up and use more ordinary programming instead.
The regexps are not mine... Sorry, I should have explained. I am
actually writing a tool that takes regexps as input and transforms
them internally into NFAs/DFAs. Since the regexps are not really in my
hands, I should be ready for weird regexps - for example, regexps with
"\b" preceded or followed by other regexps. And I don't know how to
transform a regexp that contains "\b" at an arbitrary position into an
equivalent NFA/DFA.
> > Now, if "\b" occurs at the beginning (or end) of the regexp, I think
> > it's easy to rewrite the regexp without using "\b". For example,
> > "\bex" could be rewritten as "\Wex".
>
> No, that would not match "ex" at the beginning of the string being
> matched. You could use (?:^|\W)ex.
You are right. My mistake.
> But the matched substring (Perl's
> $&) will differ from with \bex, so it depends on how the regexp is used.
I am not interested in the matched substring, so "(?:^|\W)ex" (or
rather the NFA/DFA corresponding to "(?:^|\W)ex") is fine... in this
particular case.
> > But what if "\b" occurs within the regexp? For example, how to get rid
> > of "\b" in "<RE>\bex" (with "<RE>" being any regexp)? "<RE>\Wex"
> > wouldn't work here: for example (with "<RE>" = "\W"), "\W\Wex" is not
> > equivalent to "\W\bex".
>
> Rewrite <RE> to a regexp which ends with \W or \W|^ and is equivalent
> in the cases where it is followed by \b\w.
Hmm, not so simple in the general case, but I will have to think about
this possibility.
-- dave
Return to the
comp.compilers page.
Search the
comp.compilers archives again.