Related articles |
---|
Multibyte/Wide Character Sets and Lex. juliano@SYDPO4.AUS.unisys.com (Orbach, Julian ACUS) (1996-02-09) |
Re: Multibyte/Wide Character Sets and Lex. colas@aye.inria.fr (1996-02-09) |
Re: Multibyte/Wide Character Sets and Lex. sharris@fox.nstn.ca (Sandy Harris) (1996-02-10) |
Re: Multibyte/Wide Character Sets and Lex. schwartz@galapagos.cse.psu.edu (1996-02-12) |
Re: Multibyte/Wide Character Sets and Lex. pjbumbul@math.uwaterloo.ca (1996-02-13) |
Re: Multibyte/Wide Character Sets and Lex. fjh@cs.mu.OZ.AU (1996-02-13) |
Re: Multibyte/Wide Character Sets and Lex. peter@csgrs6k1.uwaterloo.ca (1996-02-14) |
[2 later articles] |
From: | "Orbach, Julian ACUS" <juliano@SYDPO4.AUS.unisys.com> |
Newsgroups: | comp.compilers |
Date: | 9 Feb 1996 12:10:15 -0500 |
Organization: | Compilers Central |
Keywords: | lex, i18n, question, comment |
This question was asked over a year ago (94-11-096) by someone else,
but did not appear to get a positive response, so I am trying again.
Lex handles 7-bit ASCII, and many versions appear to handle 8-bit
ASCII too. However, there are other character sets in use -
particularly Unicode (16-bit character set) and Shift-JIS (8 or 16-bit
variable width character set used for Japanese by Microsoft.)
I have become stuck with the problem of how to lex a language which
allows Japanese (i.e. non-ASCII) characters in identifiers and string
literals.
Can anyone provide hints with how this could be achieved?
I have MKS Lex (and Flex too, and I am prepared to try others), and
Microsoft Visual C++ V4.0 running in a Windows NT environment.
Thanks
Julian Orbach
Australian Centre for Unisys Software
[I don't know of any lex that handles wider than 8 bit characters.
The extension from 8 to 16 bit lexers isn't straightforward, since
most 8 bit lexers use the character codes as array indices. That's
considerably less practical when the arrays are 64K each rather tha
256 words. -John]
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.