Mon, 01 Oct 2007 15:15:42 -0400

Related articles |
---|

Henry Spencer's "Tcl" Regex Library cfc@shell01.TheWorld.com (Chris F Clark) (2007-10-01) |

Re: Henry Spencer's "Tcl" Regex Library rsc@swtch.com (Russ Cox) (2007-10-02) |

Re: Henry Spencer's "Tcl" Regex Library cfc@TheWorld.com (Chris F Clark) (2007-10-02) |

From: | Chris F Clark <cfc@shell01.TheWorld.com> |

Newsgroups: | comp.compilers |

Date: | Mon, 01 Oct 2007 15:15:42 -0400 |

Organization: | The World Public Access UNIX, Brookline, MA |

Keywords: | lex, question |

Posted-Date: | 02 Oct 2007 00:43:27 EDT |

Michela Becchi, a friend and PhD student I am working with on regex

related matters and who is publishing a paper on Hybrid NFA-DFAs, got

an interesting review comment about the above mentioned library--that

it in fact implements hybrid NFA-DFAs. (I have copied Henry Spencer

via email, so that he can respond if not too busy. However, I'd also

like other insights if they are available.)

In any case, Michela's work has been in devising ways of splitting a

regex problem into sub-problems where some of the problem is solved by

a DFA and some of the problem is solved by an NFA. Her work results

in a very clever partitioning of the problem such that the resulting

machine runs at roughly DFA speeds while taking roughly NFA space. (I

believe the paper on that aspect has already been published, this is

follow-on work solving an interesting related problem.)

The question I'm curious to is whether the above library does the same

thing, i.e. partition a regex problem into a DFA portion and an NFA

portion and whether it does this statically or dynamically. Her work

does it statically, based upon the characteristics of the problem.

However, I believe there are regex libraries that incrementally

generate a DFA from an NFA dynamically as the program is running. (I

believe this is the essence of Thompson's NFA algorithm, which is

distinct from his subset construction algorithm.) That method would

have similar effects in general, but the approach and rationale is

different. For example, the dynamic version only generated the

portion of the DFA it uses. However, if it doesn't take advantage of

the static insights that Michela's approach does, it could potentially

generate an exponentially large machine. Moreover, the two approaches

are complimentary, one could generate a Hybrid machine with the

interesting characteristics that she outlined on-the-fly.

In any case, if there are hybrid NFA-DFA solutions out there (or in

progress), we would like to know what they are and how they work (and

what problems they solve).

Thanks for any input,

-Chris

*****************************************************************************

Chris Clark Internet : compres@world.std.com

Compiler Resources, Inc. Web Site : http://world.std.com/~compres

23 Bailey Rd voice : (508) 435-5016

Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.