Re: How to rewrite a regexp without word boundaries?

Andrew Tomazos <andrew@tomazos.com>
Tue, 7 Jul 2009 09:33:45 -0700 (PDT)

          From comp.compilers

Related articles
How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-05)
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-05)
Re: How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-06)
Re: How to rewrite a regexp without word boundaries? haberg_20080406@math.su.se (Hans Aberg) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? andrew@tomazos.com (Andrew Tomazos) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? cfc@shell01.TheWorld.com (Chris F Clark) (2009-07-13)
Re: How to rewrite a regexp without word boundaries? hu47121@usenet.kitty.sub.org (2009-08-16)
Re: How to rewrite a regexp without word boundaries? dot@dotat.at (Tony Finch) (2009-08-16)
| List of all articles for this month |

From: Andrew Tomazos <andrew@tomazos.com>
Newsgroups: comp.compilers,comp.theory
Date: Tue, 7 Jul 2009 09:33:45 -0700 (PDT)
Organization: Compilers Central
References: 09-07-003 09-07-004 09-07-008
Keywords: lex
Posted-Date: 10 Jul 2009 18:39:06 EDT

On Jul 6, 10:43 pm, dave_140...@hotmail.com wrote:
> > > I have been wondering, with limited success, how to rewrite a regexp
> > > without word boundaries.
>
> > Why do you want to? Most likely, the answer is that your regexps are
> > getting too clever and thus too unreadable/bug-prone, so you should
> > break them up and use more ordinary programming instead.
>
> The regexps are not mine... Sorry, I should have explained. I am
> actually writing a tool that takes regexps as input and transforms
> them internally into NFAs/DFAs. Since the regexps are not really in my
> hands, I should be ready for weird regexps - for example, regexps with
> "\b" preceded or followed by other regexps. And I don't know how to
> transform a regexp that contains "\b" at an arbitrary position into an
> equivalent NFA/DFA.


Why don't you study Perl's regex engine to see how they implement it?
It is all open source. I did this a while ago. It is very
interesting to look at, Perl has the most advanced and heavily used
regex engine out of just about anything.


Also one thing to note is that in formal language theory "regular
expression" has a well-defined meaning. See Chomsky. Perl's regular
expressions do *not* classify as regular expressions under the formal
definition.
    -Andrew.
[Perl's regex engine is swell, but if performance is an issue it's nowhere
near as fast as a DFA generated by flex or re2c. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.