Re: Leftmost longest match with DFA search

Danny.Dube@ift.ulaval.ca (=?iso-8859-1?q?Danny_Dub=E9?=)
13 May 2008 13:58:56 -0400

          From comp.compilers

Related articles
Leftmost longest match with DFA search monnier@iro.umontreal.ca (Stefan Monnier) (2008-05-10)
Re: Leftmost longest match with DFA search daniel2villeneuve@videotron.ca (Daniel Villeneuve) (2008-05-11)
Re: Leftmost longest match with DFA search rsc@swtch.com (Russ Cox) (2008-05-12)
Re: Leftmost longest match with DFA search monnier@iro.umontreal.ca (Stefan Monnier) (2008-05-13)
Re: Leftmost longest match with DFA search monnier@iro.umontreal.ca (Stefan Monnier) (2008-05-13)
Re: Leftmost longest match with DFA search Danny.Dube@ift.ulaval.ca (2008-05-13)
Re: Leftmost longest match with DFA search rsc@swtch.com (Russ Cox) (2008-05-14)
Re: Leftmost longest match with DFA search Danny.Dube@ift.remove.ulaval.remove.ca (2008-05-15)
| List of all articles for this month |

From: Danny.Dube@ift.ulaval.ca (=?iso-8859-1?q?Danny_Dub=E9?=)
Newsgroups: comp.compilers
Date: 13 May 2008 13:58:56 -0400
Organization: Compilers Central
References: 08-05-026
Keywords: lex, DFA
Posted-Date: 14 May 2008 11:56:34 EDT

Stefan Monnier <monnier@iro.umontreal.ca> writes:


> Can someone point me to articles that discuss various ways to get the
> leftmost longest match when implementing regexp search using a DFA?
>
> The "obvious" solution of turning the problem "search for RE" into the
> problem "match .*RE" (where I use "match" here to mean "anchored
> search") only gives you the leftmost shortest match.
>
> I've already seen http://compilers.iecc.com/comparch/article/07-10-026
> where Tcl's approach to the problem is described, which is sadly O(N^2).
>
> I'm also interested in similar approaches using non-backtracking NFAs
> (I'm familiar for example with the algorithm used in Plan9's regexp
> library).


I'm not sure whether there is confusion between "leftmost longest
match" and "longest leftmost match". For me, "leftmost longest match"
refers to the leftmost of the longest matches while "longest leftmost
match" refers to the longest of the leftmost matches, provided we
define what it means to be the "leftmost".


The "leftmost longest match" is defined this way:


    1. Let { (p1, q1), ..., (pK, qK) } be the matches for RE in text W,
          where pi is the beginning position (inclusive) and qi is the
          ending position (exclusive) of the ith match. This set might be
          empty.


    2. If the set is non-empty, keep only the longest matches,
          i.e. those with maximal qi - pi.


    3. Finally, keep the leftmost one, i.e. the one with minimal pi.


The "longest leftmost match" means one of two things depending on the
point that you use to indicate the "position" of the match. The
position can be given by the beginning of the match or by its end.
The "longest begin-leftmost match" is defined this way:


    1. Let { (p1, q1), ..., (pK, qK) } be the matches for RE in text W.


    2. If the set is non-empty, keep only the begin-leftmost matches,
          i.e. those with minimal pi.


    3. Finally, keep the longest one.


and the "longest end-leftmost match" is defined this way:


    1. Let { (p1, q1), ..., (pK, qK) } be the matches for RE in text W.


    2. If the set is non-empty, keep only the end-leftmost matches,
          i.e. those with minimal qi.


    3. Finally, keep the longest one.


Can you be more precise about what you want? I think it's possible to
get a linear-time algorithm for any of these searches (i.e. in O(|W|)
time for a fixed RE).


Danny


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.