Multibyte/Wide Character Sets and Lex.

"Orbach, Julian ACUS" <juliano@SYDPO4.AUS.unisys.com>
9 Feb 1996 12:10:15 -0500

From comp.compilers

Related articles
*Multibyte/Wide Character Sets and Lex. juliano@SYDPO4.AUS.unisys.com (Orbach, Julian ACUS)* (1996-02-09)**
Re: Multibyte/Wide Character Sets and Lex. colas@aye.inria.fr (1996-02-09)
Re: Multibyte/Wide Character Sets and Lex. sharris@fox.nstn.ca (Sandy Harris) (1996-02-10)
Re: Multibyte/Wide Character Sets and Lex. schwartz@galapagos.cse.psu.edu (1996-02-12)
Re: Multibyte/Wide Character Sets and Lex. pjbumbul@math.uwaterloo.ca (1996-02-13)
Re: Multibyte/Wide Character Sets and Lex. fjh@cs.mu.OZ.AU (1996-02-13)
Re: Multibyte/Wide Character Sets and Lex. peter@csgrs6k1.uwaterloo.ca (1996-02-14)
[2 later articles]

| List of all articles for this month |

From:	"Orbach, Julian ACUS" <juliano@SYDPO4.AUS.unisys.com>
Newsgroups:	comp.compilers
Date:	9 Feb 1996 12:10:15 -0500
Organization:	Compilers Central
Keywords:	lex, i18n, question, comment

This question was asked over a year ago (94-11-096) by someone else,
but did not appear to get a positive response, so I am trying again.

Lex handles 7-bit ASCII, and many versions appear to handle 8-bit
ASCII too. However, there are other character sets in use -
particularly Unicode (16-bit character set) and Shift-JIS (8 or 16-bit
variable width character set used for Japanese by Microsoft.)

I have become stuck with the problem of how to lex a language which
allows Japanese (i.e. non-ASCII) characters in identifiers and string
literals.

Can anyone provide hints with how this could be achieved?

I have MKS Lex (and Flex too, and I am prepared to try others), and
Microsoft Visual C++ V4.0 running in a Windows NT environment.

Thanks

Julian Orbach
Australian Centre for Unisys Software
[I don't know of any lex that handles wider than 8 bit characters.
The extension from 8 to 16 bit lexers isn't straightforward, since
most 8 bit lexers use the character codes as array indices. That's
considerably less practical when the arrays are 64K each rather tha
256 words. -John]
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Multibyte/Wide Character Sets and Lex.

"Orbach, Julian ACUS" <juliano@SYDPO4.AUS.unisys.com>9 Feb 1996 12:10:15 -0500

"Orbach, Julian ACUS" <juliano@SYDPO4.AUS.unisys.com>
9 Feb 1996 12:10:15 -0500