13 May 2008 13:58:56 -0400

Related articles |
---|

Leftmost longest match with DFA search monnier@iro.umontreal.ca (Stefan Monnier) (2008-05-10) |

Re: Leftmost longest match with DFA search daniel2villeneuve@videotron.ca (Daniel Villeneuve) (2008-05-11) |

Re: Leftmost longest match with DFA search rsc@swtch.com (Russ Cox) (2008-05-12) |

Re: Leftmost longest match with DFA search monnier@iro.umontreal.ca (Stefan Monnier) (2008-05-13) |

Re: Leftmost longest match with DFA search monnier@iro.umontreal.ca (Stefan Monnier) (2008-05-13) |

Re: Leftmost longest match with DFA search Danny.Dube@ift.ulaval.ca (2008-05-13) |

Re: Leftmost longest match with DFA search rsc@swtch.com (Russ Cox) (2008-05-14) |

Re: Leftmost longest match with DFA search Danny.Dube@ift.remove.ulaval.remove.ca (2008-05-15) |

From: | Danny.Dube@ift.ulaval.ca (=?iso-8859-1?q?Danny_Dub=E9?=) |

Newsgroups: | comp.compilers |

Date: | 13 May 2008 13:58:56 -0400 |

Organization: | Compilers Central |

References: | 08-05-026 |

Keywords: | lex, DFA |

Posted-Date: | 14 May 2008 11:56:34 EDT |

Stefan Monnier <monnier@iro.umontreal.ca> writes:

*> Can someone point me to articles that discuss various ways to get the*

*> leftmost longest match when implementing regexp search using a DFA?*

*>*

*> The "obvious" solution of turning the problem "search for RE" into the*

*> problem "match .*RE" (where I use "match" here to mean "anchored*

*> search") only gives you the leftmost shortest match.*

*>*

*> I've already seen http://compilers.iecc.com/comparch/article/07-10-026*

*> where Tcl's approach to the problem is described, which is sadly O(N^2).*

*>*

*> I'm also interested in similar approaches using non-backtracking NFAs*

*> (I'm familiar for example with the algorithm used in Plan9's regexp*

*> library).*

I'm not sure whether there is confusion between "leftmost longest

match" and "longest leftmost match". For me, "leftmost longest match"

refers to the leftmost of the longest matches while "longest leftmost

match" refers to the longest of the leftmost matches, provided we

define what it means to be the "leftmost".

The "leftmost longest match" is defined this way:

1. Let { (p1, q1), ..., (pK, qK) } be the matches for RE in text W,

where pi is the beginning position (inclusive) and qi is the

ending position (exclusive) of the ith match. This set might be

empty.

2. If the set is non-empty, keep only the longest matches,

i.e. those with maximal qi - pi.

3. Finally, keep the leftmost one, i.e. the one with minimal pi.

The "longest leftmost match" means one of two things depending on the

point that you use to indicate the "position" of the match. The

position can be given by the beginning of the match or by its end.

The "longest begin-leftmost match" is defined this way:

1. Let { (p1, q1), ..., (pK, qK) } be the matches for RE in text W.

2. If the set is non-empty, keep only the begin-leftmost matches,

i.e. those with minimal pi.

3. Finally, keep the longest one.

and the "longest end-leftmost match" is defined this way:

1. Let { (p1, q1), ..., (pK, qK) } be the matches for RE in text W.

2. If the set is non-empty, keep only the end-leftmost matches,

i.e. those with minimal qi.

3. Finally, keep the longest one.

Can you be more precise about what you want? I think it's possible to

get a linear-time algorithm for any of these searches (i.e. in O(|W|)

time for a fixed RE).

Danny

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.