Re: right context in scanner generators (was Re: LEX and YACC)

peterd@cs.washington.edu (Peter C. Damron)
13 Nov 89 18:20:43 GMT

          From comp.compilers

Related articles
right context in scanner generators (was Re: LEX and YACC) boehm@flora.rice.edu (1989-11-12)
Re: right context in scanner generators (was Re: LEX and YACC) boehm@flora.rice.edu (1989-11-12)
Re: right context in scanner generators (was Re: LEX and YACC) vern@cs.cornell.edu (1989-11-13)
Re: right context in scanner generators (was Re: LEX and YACC) peterd@cs.washington.edu (1989-11-13)
| List of all articles for this month |

From: peterd@cs.washington.edu (Peter C. Damron)
Newsgroups: comp.compilers
Summary: here's an idea for right context in lex
Date: 13 Nov 89 18:20:43 GMT
References: <1989Nov11.161355.10081@esegue.segue.boston.ma.us> <1989Nov12.041025.12451@esegue.segue.boston.ma.us> <1989Nov12.223642.14219@esegue.segue.boston.ma.us>
Organization: University of Washington, Computer Science, Seattle

In article <1989Nov12.223642.14219@esegue.segue.boston.ma.us> boehm@flora.rice.edu (Hans Boehm) writes:
> Trying to match the context expression separately is, at least in some
>theoretical sense, not very efficient. If there is more than one possible
>regular expression that may match, it may be necessary to look for several
>different right contexts at several different starting positions. This would
>involve backtracking within the recognition of a single token. The current
>lex algorithm has the property that it needs to only simulate a single finite
>automaton, independent of the number of regular expressions in the
>specification, and independent of whether or not they include right context.
>It does need to back up in the input at the end of every token recognition to
>find the longest match, but this only happens once per token. It's a simple
>scan to find the last time a the FA was in a final state. Right context is
>implemented by filtering out certain final states.


I don't know how lex matches right context, but here is my suggestion.


1. Take all the regular expressions for a given meta-state, and tack on
        the right context, remembering where the division is.
2. Build the finite automaton which has two types of final states,
        one for the end of a pattern and one for the end of a right context.
3. Run the scanner over an input string until the automata hits an error
        state. During the scan, push all the final states reached onto a stack
        along with the corresponding string position.
4. Examine the stack to find the first (longest) end-of-pattern final state
        that also has a corresponding end-of-right-context final state.
5. Restart the scan at the position of the end of pattern final state.


There is no need to build a reverse string matcher. You do have to re-scan
the right context of the matching pattern. However, you may be able to
build the automata in such a way that you can figure out the new final state
stack and the new starting state without any re-scan. I have no idea how
difficult that might be to compute.


Hope this helps (and works),
--
Peter C. Damron, Dept. of Computer Science, FR-35
University of Washington, Seattle, WA 98195
peterd@cs.washington.edu, {ucbvax,decvax,etc.}!uw-beaver!uw-june!peterd





Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.