Related articles |
---|
Multibyte lexers in flex? fussylizard@my-deja.com (2000-05-10) |
From: | fussylizard@my-deja.com |
Newsgroups: | comp.compilers |
Date: | 10 May 2000 02:51:56 -0400 |
Organization: | Deja.com - Before you buy. |
Keywords: | lex, i18n, comment |
Does anyone have any experience in or tricks for developing scanners
in flex (or a variant) that support multibyte characters? I am
interested in developing a lexer (actually extending an existing one)
that will have to support different code pages at runtime. So, for
example, I would like to recognize patterns such as:
KEYWORD = VALUE
where VALUE can contain multibyte characters in the current codepage
(Japanese Shift-JIS, EUC-JP, etc. depending on where the executable is
running).
I realize there are some ways around this by writing patterns such
as:
KEYWORD[ \t]*= { BEGIN(MULTIBYTE_MODE) }
<MB_MODE>.+ { /* punt to some external fcn to
handle the multibyte string */ }
but this is somewhat ugly and requires me to be very careful about how
I write my patterns. Anyone have any ideas?
Thanks,
Chris
[This has come up before. In its usual 8-bit transparent mode, lex handles
multibyte characters just fine as multi-character sequences. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.