|right context in scanner generators (was Re: LEX and YACC) email@example.com (1989-11-12)|
|Re: right context in scanner generators (was Re: LEX and YACC) firstname.lastname@example.org (1989-11-12)|
|Re: right context in scanner generators (was Re: LEX and YACC) email@example.com (1989-11-13)|
|Re: right context in scanner generators (was Re: LEX and YACC) firstname.lastname@example.org (1989-11-13)|
|From:||email@example.com (Peter C. Damron)|
|Summary:||here's an idea for right context in lex|
|Date:||13 Nov 89 18:20:43 GMT|
|References:||<1989Nov11.firstname.lastname@example.org> <1989Nov12.email@example.com> <1989Nov12.firstname.lastname@example.org>|
|Organization:||University of Washington, Computer Science, Seattle|
In article <1989Nov12.email@example.com> firstname.lastname@example.org (Hans Boehm) writes:
> Trying to match the context expression separately is, at least in some
>theoretical sense, not very efficient. If there is more than one possible
>regular expression that may match, it may be necessary to look for several
>different right contexts at several different starting positions. This would
>involve backtracking within the recognition of a single token. The current
>lex algorithm has the property that it needs to only simulate a single finite
>automaton, independent of the number of regular expressions in the
>specification, and independent of whether or not they include right context.
>It does need to back up in the input at the end of every token recognition to
>find the longest match, but this only happens once per token. It's a simple
>scan to find the last time a the FA was in a final state. Right context is
>implemented by filtering out certain final states.
I don't know how lex matches right context, but here is my suggestion.
1. Take all the regular expressions for a given meta-state, and tack on
the right context, remembering where the division is.
2. Build the finite automaton which has two types of final states,
one for the end of a pattern and one for the end of a right context.
3. Run the scanner over an input string until the automata hits an error
state. During the scan, push all the final states reached onto a stack
along with the corresponding string position.
4. Examine the stack to find the first (longest) end-of-pattern final state
that also has a corresponding end-of-right-context final state.
5. Restart the scan at the position of the end of pattern final state.
There is no need to build a reverse string matcher. You do have to re-scan
the right context of the matching pattern. However, you may be able to
build the automata in such a way that you can figure out the new final state
stack and the new starting state without any re-scan. I have no idea how
difficult that might be to compute.
Hope this helps (and works),
Peter C. Damron, Dept. of Computer Science, FR-35
University of Washington, Seattle, WA 98195
Return to the
Search the comp.compilers archives again.