Related articles |
---|
looking for Lex/Bison unicode support porky72@hotmail.com (Yaron Bracha) (2000-01-19) |
Re: looking for Lex/Bison unicode support qjackson@wave.home.com (Quinn Tyler Jackson) (2000-01-21) |
Re: looking for Lex/Bison unicode support dmr@bell-labs.com (Dennis Ritchie) (2000-01-21) |
Re: looking for Lex/Bison unicode support chet@watson.ibm.com (2000-01-23) |
Re: looking for Lex/Bison unicode support webid@asi.fr (Armel) (2000-02-04) |
From: | "Armel" <webid@asi.fr> |
Newsgroups: | comp.compilers |
Date: | 4 Feb 2000 02:58:19 -0500 |
Organization: | WebID |
References: | 00-01-081 00-01-087 00-01-096 |
Keywords: | lex, i18n |
Hello,
One way to do this lexing work could be the following :
- use a static algorithm that analyze all your regular expressions to
characterize sets of characters that are obviously equivalent (always saw
together and with no ohter chars, don't forget to join OR between RE that
are exactly of length 1)
- compute a correspondance table where table[UNICODECHAR] = [representative
char in the set] (or optimize it with a double entry, ie treat each 256
paquets chars as a line and if others chars modulo 256 behave equivalently
don't duplicate the line, index it)
- then use the representative char in the 8 bits charset in lex and do your
lexing work as every day (but dont use the text sent by lex, use one that
has the same length in the real flow of characters)
Example :
if you have :
[a-fA-F][a-z]
Sets are :
[a-f] [g-z] [A-F]
a g A
and the above expression becomes : [aA]g
if chars are unicode in the top expression, it works too.
i know it's not very easy, because you have to find a regular expression
parser before (but it's quite easy tu do it), and then recompute expression,
but all this stuff can be done easily.
(I'm personnaly working on a equivalent of Lex/Yacc, and i think of using
this technic, if you find it fool please tell me :)
Armel
Return to the
comp.compilers page.
Search the
comp.compilers archives again.