Re: How to rewrite a regexp without word boundaries?

dave_140390@hotmail.com
Mon, 6 Jul 2009 13:43:54 -0700 (PDT)

          From comp.compilers

Related articles
How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-05)
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-05)
Re: How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-06)
Re: How to rewrite a regexp without word boundaries? haberg_20080406@math.su.se (Hans Aberg) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? andrew@tomazos.com (Andrew Tomazos) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? cfc@shell01.TheWorld.com (Chris F Clark) (2009-07-13)
Re: How to rewrite a regexp without word boundaries? hu47121@usenet.kitty.sub.org (2009-08-16)
Re: How to rewrite a regexp without word boundaries? dot@dotat.at (Tony Finch) (2009-08-16)
| List of all articles for this month |

From: dave_140390@hotmail.com
Newsgroups: comp.compilers,comp.theory
Date: Mon, 6 Jul 2009 13:43:54 -0700 (PDT)
Organization: Compilers Central
References: 09-07-003 09-07-004
Keywords: lex, DFA
Posted-Date: 06 Jul 2009 17:22:33 EDT

> > I have been wondering, with limited success, how to rewrite a regexp
> > without word boundaries.
>
> Why do you want to? Most likely, the answer is that your regexps are
> getting too clever and thus too unreadable/bug-prone, so you should
> break them up and use more ordinary programming instead.


The regexps are not mine... Sorry, I should have explained. I am
actually writing a tool that takes regexps as input and transforms
them internally into NFAs/DFAs. Since the regexps are not really in my
hands, I should be ready for weird regexps - for example, regexps with
"\b" preceded or followed by other regexps. And I don't know how to
transform a regexp that contains "\b" at an arbitrary position into an
equivalent NFA/DFA.




> > Now, if "\b" occurs at the beginning (or end) of the regexp, I think
> > it's easy to rewrite the regexp without using "\b". For example,
> > "\bex" could be rewritten as "\Wex".
>
> No, that would not match "ex" at the beginning of the string being
> matched. You could use (?:^|\W)ex.


You are right. My mistake.




> But the matched substring (Perl's
> $&) will differ from with \bex, so it depends on how the regexp is used.


I am not interested in the matched substring, so "(?:^|\W)ex" (or
rather the NFA/DFA corresponding to "(?:^|\W)ex") is fine... in this
particular case.




> > But what if "\b" occurs within the regexp? For example, how to get rid
> > of "\b" in "<RE>\bex" (with "<RE>" being any regexp)? "<RE>\Wex"
> > wouldn't work here: for example (with "<RE>" = "\W"), "\W\Wex" is not
> > equivalent to "\W\bex".
>
> Rewrite <RE> to a regexp which ends with \W or \W|^ and is equivalent
> in the cases where it is followed by \b\w.


Hmm, not so simple in the general case, but I will have to think about
this possibility.




-- dave



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.