Related articles |
---|
Pondering the future of lexical analysis clint@0lsen.net (Clint Olsen) (2002-10-18) |
Re: Pondering the future of lexical analysis jmcenerney@austin.rr.com (John McEnerney) (2002-10-20) |
Re: Pondering the future of lexical analysis snicol@apk.net (Scott Nicol) (2002-10-20) |
Re: Pondering the future of lexical analysis whopkins@alpha2.csd.uwm.edu (Mark) (2002-10-20) |
Re: Pondering the future of lexical analysis arnold@skeeve.com (Aharon Robbins) (2002-10-20) |
From: | "Scott Nicol" <snicol@apk.net> |
Newsgroups: | comp.compilers |
Date: | 20 Oct 2002 22:50:28 -0400 |
Organization: | APK Net |
References: | 02-10-068 |
Keywords: | lex |
Posted-Date: | 20 Oct 2002 22:50:28 EDT |
> [I know there are Unicode versions of lex, such as the one from plan
> 9.
Not according to the docs: http://www.cs.bell-labs.com/magic/man2html/1/lex
(look under bugs)
> And yes, you only need to store valid transitions. One technique
> is to store the highest and lowest valid tokens in each state and a
> vector of transitions [lowest,highest]. -John]
Another technique, similar to the above, would be to use a 2-level table -
high-order 8 bits followed by low-order 8. If all the transitions (valid or
not) within high-8 are the same, code the transition. If there are
differences, refer to another table to deal with low-8. This has the
advantage of quick lookup (at most 2 array dereferences), but the
disadvantage of not being compatible beyond 16 bits (which is where Unicode
is heading).
--
Scott Nicol
snicol@apk.net
Return to the
comp.compilers page.
Search the
comp.compilers archives again.