Related articles |
---|
Multibyte/Wide Character Sets and Lex. juliano@SYDPO4.AUS.unisys.com (Orbach, Julian ACUS) (1996-02-09) |
Re: Multibyte/Wide Character Sets and Lex. colas@aye.inria.fr (1996-02-09) |
Re: Multibyte/Wide Character Sets and Lex. sharris@fox.nstn.ca (Sandy Harris) (1996-02-10) |
Re: Multibyte/Wide Character Sets and Lex. schwartz@galapagos.cse.psu.edu (1996-02-12) |
Re: Multibyte/Wide Character Sets and Lex. pjbumbul@math.uwaterloo.ca (1996-02-13) |
Re: Multibyte/Wide Character Sets and Lex. fjh@cs.mu.OZ.AU (1996-02-13) |
Re: Multibyte/Wide Character Sets and Lex. peter@csgrs6k1.uwaterloo.ca (1996-02-14) |
Re: Multibyte/Wide Character Sets and Lex. mparks@oz.net (Michael Parkes) (1996-02-14) |
Re: Multibyte/Wide Character Sets and Lex. jfc@mit.edu (1996-02-14) |
From: | Michael Parkes <mparks@oz.net> |
Newsgroups: | comp.compilers |
Date: | 14 Feb 1996 21:24:08 -0500 |
Organization: | Sense Networking Seattle (www.oz.net) |
References: | 96-02-065 |
Keywords: | lex, i18n |
"Orbach, Julian ACUS" <juliano@SYDPO4.AUS.unisys.com> wrote:
>[Lex handles 7 or 8 bitr chars, not 16 bit or wider. How do I lex Japanese?]
>[I don't know of any lex that handles wider than 8 bit characters.
>The extension from 8 to 16 bit lexers isn't straightforward, since
>most 8 bit lexers use the character codes as array indices. That's
>considerably less practical when the arrays are 64K each rather tha
>256 words. -John]
In a lot of cases it simply does not matter that characters are
16-bit. Certainly in COBOL some compilers just ignore this fact in a
many situations. However, in general you are correct to say it is a
complex problem. John points out why it is hard to change most common
lexers. Even if they are modified to use some more economical
algorithm the next question is usually - "why does this lexer run like
a dog". I know - I have actually tried it and had a lexer that could
parse 16 bit characters. Needless to say I changed it to improve
performance.
Regards,
Mike
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.