Multibyte/Wide Character Sets and Lex.

"Orbach, Julian ACUS" <juliano@SYDPO4.AUS.unisys.com>
9 Feb 1996 12:10:15 -0500

          From comp.compilers

Related articles
Multibyte/Wide Character Sets and Lex. juliano@SYDPO4.AUS.unisys.com (Orbach, Julian ACUS) (1996-02-09)
Re: Multibyte/Wide Character Sets and Lex. colas@aye.inria.fr (1996-02-09)
Re: Multibyte/Wide Character Sets and Lex. sharris@fox.nstn.ca (Sandy Harris) (1996-02-10)
Re: Multibyte/Wide Character Sets and Lex. schwartz@galapagos.cse.psu.edu (1996-02-12)
Re: Multibyte/Wide Character Sets and Lex. pjbumbul@math.uwaterloo.ca (1996-02-13)
Re: Multibyte/Wide Character Sets and Lex. fjh@cs.mu.OZ.AU (1996-02-13)
Re: Multibyte/Wide Character Sets and Lex. peter@csgrs6k1.uwaterloo.ca (1996-02-14)
[2 later articles]
| List of all articles for this month |

From: "Orbach, Julian ACUS" <juliano@SYDPO4.AUS.unisys.com>
Newsgroups: comp.compilers
Date: 9 Feb 1996 12:10:15 -0500
Organization: Compilers Central
Keywords: lex, i18n, question, comment

This question was asked over a year ago (94-11-096) by someone else,
but did not appear to get a positive response, so I am trying again.


Lex handles 7-bit ASCII, and many versions appear to handle 8-bit
ASCII too. However, there are other character sets in use -
particularly Unicode (16-bit character set) and Shift-JIS (8 or 16-bit
variable width character set used for Japanese by Microsoft.)


I have become stuck with the problem of how to lex a language which
allows Japanese (i.e. non-ASCII) characters in identifiers and string
literals.


Can anyone provide hints with how this could be achieved?


I have MKS Lex (and Flex too, and I am prepared to try others), and
Microsoft Visual C++ V4.0 running in a Windows NT environment.


Thanks


Julian Orbach
Australian Centre for Unisys Software
[I don't know of any lex that handles wider than 8 bit characters.
The extension from 8 to 16 bit lexers isn't straightforward, since
most 8 bit lexers use the character codes as array indices. That's
considerably less practical when the arrays are 64K each rather tha
256 words. -John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.