Re: looking for Lex/Bison unicode support

"Armel" <>
4 Feb 2000 02:58:19 -0500

          From comp.compilers

Related articles
looking for Lex/Bison unicode support (Yaron Bracha) (2000-01-19)
Re: looking for Lex/Bison unicode support (Quinn Tyler Jackson) (2000-01-21)
Re: looking for Lex/Bison unicode support (Dennis Ritchie) (2000-01-21)
Re: looking for Lex/Bison unicode support (2000-01-23)
Re: looking for Lex/Bison unicode support (Armel) (2000-02-04)
| List of all articles for this month |

From: "Armel" <>
Newsgroups: comp.compilers
Date: 4 Feb 2000 02:58:19 -0500
Organization: WebID
References: 00-01-081 00-01-087 00-01-096
Keywords: lex, i18n


One way to do this lexing work could be the following :
- use a static algorithm that analyze all your regular expressions to
characterize sets of characters that are obviously equivalent (always saw
together and with no ohter chars, don't forget to join OR between RE that
are exactly of length 1)
- compute a correspondance table where table[UNICODECHAR] = [representative
char in the set] (or optimize it with a double entry, ie treat each 256
paquets chars as a line and if others chars modulo 256 behave equivalently
don't duplicate the line, index it)
- then use the representative char in the 8 bits charset in lex and do your
lexing work as every day (but dont use the text sent by lex, use one that
has the same length in the real flow of characters)

Example :
if you have :


Sets are :
[a-f] [g-z] [A-F]

a g A

and the above expression becomes : [aA]g
if chars are unicode in the top expression, it works too.

i know it's not very easy, because you have to find a regular expression
parser before (but it's quite easy tu do it), and then recompute expression,
but all this stuff can be done easily.

(I'm personnaly working on a equivalent of Lex/Yacc, and i think of using
this technic, if you find it fool please tell me :)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.