8-bit or Multi byte chars with Lex

shailu@research.trddc.ernet.in (Shailendra Abhyankar)
Mon, 14 Nov 1994 13:01:55 GMT

From comp.compilers

Related articles
*8-bit or Multi byte chars with Lex shailu@research.trddc.ernet.in* (1994-11-14)**

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	shailu@research.trddc.ernet.in (Shailendra Abhyankar)
Keywords:	lex, question, i18n, comment
Organization:	Compilers Central
Date:	Mon, 14 Nov 1994 13:01:55 GMT

To All :

Does anybody know how to sccan programs which can have identifiers in
english as well as japanese ( or a mixture ). By japanese characters
I mean hirakana/katakana and kanji ( well whatever that means ... ).

We want to use lex, infact I have already used lex to process the input
( excluding the japanese stuff ). The version of lex I have does not
support 8 bit symbols ( yymatch seems to be a 128 byte array ).

The japanese chaarcters may be part of identifiers or comments. I think
comments do not pose any problems, as this simply requires dumping these
characters ( and hence need not be specified in any match pattern ).

For japanese characters in identifiers, I may require some preprocessing
before using lex. Say something like :

Converting the extended ASCII stuff to some unused 7 bit char values.
Then using the above mapped characters for specifying identifiers.

Though I am not sure this is plausible, or if so, is it the best
approach ?

Is there a version of lex available free which handles 8 bit or multi
byte characters and in particular japanese character sets ? I am sure
there must be some considering unicode etc ...

If anyone can give me useful pointers, it will be great

Later
Shailendra Abhyankar

Shailendra Abhyankar e-mail : shailu@trddc.ernet.in
Tata Consultancy Services Tel : 91-212-622809
1 Mangaldas Rd
Pune 411001
India
[Flex handles 8-bit characters without trouble, but I don't offhand know
of scanner generators that handle multibyte or wide characters. -John]
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

8-bit or Multi byte chars with Lex

shailu@research.trddc.ernet.in (Shailendra Abhyankar)Mon, 14 Nov 1994 13:01:55 GMT

shailu@research.trddc.ernet.in (Shailendra Abhyankar)
Mon, 14 Nov 1994 13:01:55 GMT