Re: How to rewrite a regexp without word boundaries?

Hans Aberg <haberg_20080406@math.su.se>
Tue, 07 Jul 2009 01:10:21 +0200

          From comp.compilers

Related articles
How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-05)
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-05)
Re: How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-06)
Re: How to rewrite a regexp without word boundaries? haberg_20080406@math.su.se (Hans Aberg) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? andrew@tomazos.com (Andrew Tomazos) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? cfc@shell01.TheWorld.com (Chris F Clark) (2009-07-13)
Re: How to rewrite a regexp without word boundaries? hu47121@usenet.kitty.sub.org (2009-08-16)
Re: How to rewrite a regexp without word boundaries? dot@dotat.at (Tony Finch) (2009-08-16)
| List of all articles for this month |

From: Hans Aberg <haberg_20080406@math.su.se>
Newsgroups: comp.compilers,comp.theory
Date: Tue, 07 Jul 2009 01:10:21 +0200
Organization: Aioe.org NNTP Server
References: 09-07-003 09-07-004 09-07-008
Keywords: lex, theory
Posted-Date: 07 Jul 2009 02:16:26 EDT

dave_140390@hotmail.com wrote:
>>> I have been wondering, with limited success, how to rewrite a regexp
>>> without word boundaries.


> ...I am
> actually writing a tool that takes regexps as input and transforms
> them internally into NFAs/DFAs. ... I don't know how to
> transform a regexp that contains "\b" at an arbitrary position into an
> equivalent NFA/DFA.




I don't think that is possible: that is an additional mechanism on top
of the reg-ex structure.


Formally, let C be the character set, and C* the set of strings (the
free monoid on the set C). Then a language L is a subset of C*. The
regular expressions are operations on subsets of C* that define
languages. If L is a regular expression language, then a string either
matches - it is a subset of L - or it does not.


Now, when used in a regular expression matcher program, one selects a
parsing point in the string and looks at the set of strings of
successive lengths from that point. Usually, the match is the longest of
those strings that are in L. But there is a choice, which does not
depend on L.


So I think you need to implement a mechanism without the NFA/DFA
structure that identifies the \b boundaries.


      Hans


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.