Related articles |
---|
right context in scanner generators (was Re: LEX and YACC) boehm@flora.rice.edu (1989-11-12) |
Re: right context in scanner generators (was Re: LEX and YACC) boehm@flora.rice.edu (1989-11-12) |
Re: right context in scanner generators (was Re: LEX and YACC) vern@cs.cornell.edu (1989-11-13) |
Re: right context in scanner generators (was Re: LEX and YACC) peterd@cs.washington.edu (1989-11-13) |
From: | peterd@cs.washington.edu (Peter C. Damron) |
Newsgroups: | comp.compilers |
Summary: | here's an idea for right context in lex |
Date: | 13 Nov 89 18:20:43 GMT |
References: | <1989Nov11.161355.10081@esegue.segue.boston.ma.us> <1989Nov12.041025.12451@esegue.segue.boston.ma.us> <1989Nov12.223642.14219@esegue.segue.boston.ma.us> |
Organization: | University of Washington, Computer Science, Seattle |
In article <1989Nov12.223642.14219@esegue.segue.boston.ma.us> boehm@flora.rice.edu (Hans Boehm) writes:
> Trying to match the context expression separately is, at least in some
>theoretical sense, not very efficient. If there is more than one possible
>regular expression that may match, it may be necessary to look for several
>different right contexts at several different starting positions. This would
>involve backtracking within the recognition of a single token. The current
>lex algorithm has the property that it needs to only simulate a single finite
>automaton, independent of the number of regular expressions in the
>specification, and independent of whether or not they include right context.
>It does need to back up in the input at the end of every token recognition to
>find the longest match, but this only happens once per token. It's a simple
>scan to find the last time a the FA was in a final state. Right context is
>implemented by filtering out certain final states.
I don't know how lex matches right context, but here is my suggestion.
1. Take all the regular expressions for a given meta-state, and tack on
the right context, remembering where the division is.
2. Build the finite automaton which has two types of final states,
one for the end of a pattern and one for the end of a right context.
3. Run the scanner over an input string until the automata hits an error
state. During the scan, push all the final states reached onto a stack
along with the corresponding string position.
4. Examine the stack to find the first (longest) end-of-pattern final state
that also has a corresponding end-of-right-context final state.
5. Restart the scan at the position of the end of pattern final state.
There is no need to build a reverse string matcher. You do have to re-scan
the right context of the matching pattern. However, you may be able to
build the automata in such a way that you can figure out the new final state
stack and the new starting state without any re-scan. I have no idea how
difficult that might be to compute.
Hope this helps (and works),
--
Peter C. Damron, Dept. of Computer Science, FR-35
University of Washington, Seattle, WA 98195
peterd@cs.washington.edu, {ucbvax,decvax,etc.}!uw-beaver!uw-june!peterd
Return to the
comp.compilers page.
Search the
comp.compilers archives again.