|8-bit or Multi byte chars with Lex email@example.com (1994-11-14)|
|From:||firstname.lastname@example.org (Shailendra Abhyankar)|
|Keywords:||lex, question, i18n, comment|
|Date:||Mon, 14 Nov 1994 13:01:55 GMT|
To All :
Does anybody know how to sccan programs which can have identifiers in
english as well as japanese ( or a mixture ). By japanese characters
I mean hirakana/katakana and kanji ( well whatever that means ... ).
We want to use lex, infact I have already used lex to process the input
( excluding the japanese stuff ). The version of lex I have does not
support 8 bit symbols ( yymatch seems to be a 128 byte array ).
The japanese chaarcters may be part of identifiers or comments. I think
comments do not pose any problems, as this simply requires dumping these
characters ( and hence need not be specified in any match pattern ).
For japanese characters in identifiers, I may require some preprocessing
before using lex. Say something like :
Converting the extended ASCII stuff to some unused 7 bit char values.
Then using the above mapped characters for specifying identifiers.
Though I am not sure this is plausible, or if so, is it the best
Is there a version of lex available free which handles 8 bit or multi
byte characters and in particular japanese character sets ? I am sure
there must be some considering unicode etc ...
If anyone can give me useful pointers, it will be great
Shailendra Abhyankar e-mail : email@example.com
Tata Consultancy Services Tel : 91-212-622809
1 Mangaldas Rd
[Flex handles 8-bit characters without trouble, but I don't offhand know
of scanner generators that handle multibyte or wide characters. -John]
Return to the
Search the comp.compilers archives again.