Re: Wanted: Regular Expression -> Finite Automata C code =-

henry@zoo.toronto.edu (Henry Spencer)
Mon, 12 Apr 1993 19:50:34 GMT

          From comp.compilers

Related articles
Re: Wanted: Regular Expression -> Finite Automata C code =- tdh6t@helga3.acc.virginia.edu (Todd Hodes) (1993-04-05)
Re: Wanted: Regular Expression -> Finite Automata C code =- megatest!plethorax!djones@uu2.psi.com (1993-04-09)
Re: Wanted: Regular Expression -> Finite Automata C code =- henry@zoo.toronto.edu (1993-04-12)
Re: Wanted: Regular Expression -> Finite Automata C code =- tdh6t@onyx.cs.virginia.edu (Todd Hodes) (1993-04-14)
| List of all articles for this month |

Newsgroups: comp.compilers
From: henry@zoo.toronto.edu (Henry Spencer)
Keywords: DFA, lex
Organization: U of Toronto Zoology
References: 93-04-023 93-04-041
Date: Mon, 12 Apr 1993 19:50:34 GMT

megatest!plethorax!djones@uu2.psi.com (Dave Jones) writes:
>Also, every time a final state is encountered, you will want to remember
>the number of characters that have been processed at that point. The last
>number remembered will be the length of the longest match.


Unfortunately, this doesn't generalize to most real-life applications,
where the match is not anchored at the start of the string. It's easy to
generalize the matching algorithm so you still make only one pass, by
considering the possibility of a fresh start at each character, and this
is definitely the way to do it -- you don't want to retry the match at
each possible starting position! Alas, the bookkeeping for longest match
falls apart. You don't know when the particular match that corresponds to
a new final state started. Consider the RE "(abcde|cd|ef)" and the string
"xabcdefy". You reach a final state after seeing the "d", another after
the "e", and another after the "f", but the last one is not the longest.
--
Henry Spencer @ U of Toronto Zoology, henry@zoo.toronto.edu utzoo!henry
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.