Related articles |
---|
How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-05) |
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-05) |
Re: How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-06) |
Re: How to rewrite a regexp without word boundaries? haberg_20080406@math.su.se (Hans Aberg) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? andrew@tomazos.com (Andrew Tomazos) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? cfc@shell01.TheWorld.com (Chris F Clark) (2009-07-13) |
Re: How to rewrite a regexp without word boundaries? hu47121@usenet.kitty.sub.org (2009-08-16) |
Re: How to rewrite a regexp without word boundaries? dot@dotat.at (Tony Finch) (2009-08-16) |
From: | Hans Aberg <haberg_20080406@math.su.se> |
Newsgroups: | comp.compilers,comp.theory |
Date: | Tue, 07 Jul 2009 01:10:21 +0200 |
Organization: | Aioe.org NNTP Server |
References: | 09-07-003 09-07-004 09-07-008 |
Keywords: | lex, theory |
Posted-Date: | 07 Jul 2009 02:16:26 EDT |
dave_140390@hotmail.com wrote:
>>> I have been wondering, with limited success, how to rewrite a regexp
>>> without word boundaries.
> ...I am
> actually writing a tool that takes regexps as input and transforms
> them internally into NFAs/DFAs. ... I don't know how to
> transform a regexp that contains "\b" at an arbitrary position into an
> equivalent NFA/DFA.
I don't think that is possible: that is an additional mechanism on top
of the reg-ex structure.
Formally, let C be the character set, and C* the set of strings (the
free monoid on the set C). Then a language L is a subset of C*. The
regular expressions are operations on subsets of C* that define
languages. If L is a regular expression language, then a string either
matches - it is a subset of L - or it does not.
Now, when used in a regular expression matcher program, one selects a
parsing point in the string and looks at the set of strings of
successive lengths from that point. Usually, the match is the longest of
those strings that are in L. But there is a choice, which does not
depend on L.
So I think you need to implement a mechanism without the NFA/DFA
structure that identifies the \b boundaries.
Hans
Return to the
comp.compilers page.
Search the
comp.compilers archives again.