Re: Pondering the future of lexical analysis

"Scott Nicol" <snicol@apk.net>
20 Oct 2002 22:50:28 -0400

From comp.compilers

Related articles
Pondering the future of lexical analysis clint@0lsen.net (Clint Olsen) (2002-10-18)
Re: Pondering the future of lexical analysis jmcenerney@austin.rr.com (John McEnerney) (2002-10-20)
*Re: Pondering the future of lexical analysis snicol@apk.net (Scott Nicol)* (2002-10-20)**
Re: Pondering the future of lexical analysis whopkins@alpha2.csd.uwm.edu (Mark) (2002-10-20)
Re: Pondering the future of lexical analysis arnold@skeeve.com (Aharon Robbins) (2002-10-20)

| List of all articles for this month |

From:	"Scott Nicol" <snicol@apk.net>
Newsgroups:	comp.compilers
Date:	20 Oct 2002 22:50:28 -0400
Organization:	APK Net
References:	02-10-068
Keywords:	lex
Posted-Date:	20 Oct 2002 22:50:28 EDT

> [I know there are Unicode versions of lex, such as the one from plan
> 9.

Not according to the docs: http://www.cs.bell-labs.com/magic/man2html/1/lex
(look under bugs)

> And yes, you only need to store valid transitions. One technique
> is to store the highest and lowest valid tokens in each state and a
> vector of transitions [lowest,highest]. -John]

Another technique, similar to the above, would be to use a 2-level table -
high-order 8 bits followed by low-order 8. If all the transitions (valid or
not) within high-8 are the same, code the transition. If there are
differences, refer to another table to deal with low-8. This has the
advantage of quick lookup (at most 2 array dereferences), but the
disadvantage of not being compatible beyond 16 bits (which is where Unicode
is heading).

--
Scott Nicol
snicol@apk.net

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Pondering the future of lexical analysis

"Scott Nicol" <snicol@apk.net>20 Oct 2002 22:50:28 -0400

"Scott Nicol" <snicol@apk.net>
20 Oct 2002 22:50:28 -0400