Re: '^' and '$' in Regular expression

"Armel" <armelasselin@hotmail.com>
Thu, 22 Apr 2010 17:11:33 +0200

          From comp.compilers

Related articles
'^' and '$' in Regular expression march1896@gmail.com (Tangel) (2010-04-21)
Re: '^' and '$' in Regular expression cfc@shell01.TheWorld.com (Chris F Clark) (2010-04-21)
Re: '^' and '$' in Regular expression march1896@gmail.com (Tangel) (2010-04-21)
Re: '^' and '$' in Regular expression mailings@jmksf.com (mailings@jmksf.com) (2010-04-22)
Re: '^' and '$' in Regular expression armelasselin@hotmail.com (Armel) (2010-04-22)
Re: '^' and '$' in Regular expression cfc@shell01.TheWorld.com (Chris F Clark) (2010-04-22)
Re: '^' and '$' in Regular expression quinn_jackson2004@yahoo.ca (Quinn Tyler Jackson) (2010-04-22)
Re: '^' and '$' in Regular expression cfc@shell01.TheWorld.com (Chris F Clark) (2010-04-22)
| List of all articles for this month |

From: "Armel" <armelasselin@hotmail.com>
Newsgroups: comp.compilers
Date: Thu, 22 Apr 2010 17:11:33 +0200
Organization: Compilers Central
References: 10-04-052
Keywords: lex, DFA
Posted-Date: 22 Apr 2010 12:52:56 EDT

"Tangel" <march1896@gmail.com> a icrit dans le message de news:
> I am writing a regexp lib which can be accessed on
> http://code.google.com/p/snev/source/browse/#svn/trunk/tlab/regexp/new.
> It use the theory in dragon book, regexp->NFA->DFA. but when I try to
> match the '^'(begin of line) and '$'(end of line), some problems
> occurs.
> Other symbols except ^ and $ are 'really' characters, they can be
> the weights of the edges, but ^ and $ are not.
> And I try to match ^ and $ as '\n', and add '\n' to the ends of the
> text, but it seems not a prefect solution.


It cannot work that way because the DFA is expected to contain only _real_
characters, not constraints. for example, you won't be able either to make
sub expressions or positive/negative look-ahead either. Sub expressions in
bare DFA are possible only in a subset of the cases that you generally want
to represent in regular expressions.
for the sub-expressions, DFA cannot express a proper limit between what is
inside and what is outside in general case (it is not possible to put
'go-in/go-out' labels on transitions or states because they can mix RE-nodes
inside and outside, in particular because of nodes jointure based only on
what can follow)
for the constraints, you cannot because a bare DFA as explained in Dragon
book cannot match something without progressing in the text: one transition
= one advance.


you might be able to code contraints though by using attributed labels (e.g.
'^a' would a different transition then 'a'), you then would have to define
carefully if you can and how to join transitions not sharing same
attributes, in addition, you might use go-in/go-out notations on the
first/end transitions on constraints by enforcing your user to use only
non-empty, non-nesting constraints and keeping a memory were last opened
constraint started.


HIH
Armel



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.