Re: Multibyte/Wide Character Sets and Lex.

Michael Parkes <mparks@oz.net>
14 Feb 1996 21:24:08 -0500

          From comp.compilers

Related articles
Multibyte/Wide Character Sets and Lex. juliano@SYDPO4.AUS.unisys.com (Orbach, Julian ACUS) (1996-02-09)
Re: Multibyte/Wide Character Sets and Lex. colas@aye.inria.fr (1996-02-09)
Re: Multibyte/Wide Character Sets and Lex. sharris@fox.nstn.ca (Sandy Harris) (1996-02-10)
Re: Multibyte/Wide Character Sets and Lex. schwartz@galapagos.cse.psu.edu (1996-02-12)
Re: Multibyte/Wide Character Sets and Lex. pjbumbul@math.uwaterloo.ca (1996-02-13)
Re: Multibyte/Wide Character Sets and Lex. fjh@cs.mu.OZ.AU (1996-02-13)
Re: Multibyte/Wide Character Sets and Lex. peter@csgrs6k1.uwaterloo.ca (1996-02-14)
Re: Multibyte/Wide Character Sets and Lex. mparks@oz.net (Michael Parkes) (1996-02-14)
Re: Multibyte/Wide Character Sets and Lex. jfc@mit.edu (1996-02-14)
| List of all articles for this month |

From: Michael Parkes <mparks@oz.net>
Newsgroups: comp.compilers
Date: 14 Feb 1996 21:24:08 -0500
Organization: Sense Networking Seattle (www.oz.net)
References: 96-02-065
Keywords: lex, i18n

"Orbach, Julian ACUS" <juliano@SYDPO4.AUS.unisys.com> wrote:
>[Lex handles 7 or 8 bitr chars, not 16 bit or wider. How do I lex Japanese?]


>[I don't know of any lex that handles wider than 8 bit characters.
>The extension from 8 to 16 bit lexers isn't straightforward, since
>most 8 bit lexers use the character codes as array indices. That's
>considerably less practical when the arrays are 64K each rather tha
>256 words. -John]


In a lot of cases it simply does not matter that characters are
16-bit. Certainly in COBOL some compilers just ignore this fact in a
many situations. However, in general you are correct to say it is a
complex problem. John points out why it is hard to change most common
lexers. Even if they are modified to use some more economical
algorithm the next question is usually - "why does this lexer run like
a dog". I know - I have actually tried it and had a lexer that could
parse 16 bit characters. Needless to say I changed it to improve
performance.


Regards,


Mike
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.