Related articles |
---|
8-bit or Multi byte chars with Lex shailu@research.trddc.ernet.in (1994-11-14) |
Newsgroups: | comp.compilers |
From: | shailu@research.trddc.ernet.in (Shailendra Abhyankar) |
Keywords: | lex, question, i18n, comment |
Organization: | Compilers Central |
Date: | Mon, 14 Nov 1994 13:01:55 GMT |
To All :
Does anybody know how to sccan programs which can have identifiers in
english as well as japanese ( or a mixture ). By japanese characters
I mean hirakana/katakana and kanji ( well whatever that means ... ).
We want to use lex, infact I have already used lex to process the input
( excluding the japanese stuff ). The version of lex I have does not
support 8 bit symbols ( yymatch seems to be a 128 byte array ).
The japanese chaarcters may be part of identifiers or comments. I think
comments do not pose any problems, as this simply requires dumping these
characters ( and hence need not be specified in any match pattern ).
For japanese characters in identifiers, I may require some preprocessing
before using lex. Say something like :
Converting the extended ASCII stuff to some unused 7 bit char values.
Then using the above mapped characters for specifying identifiers.
Though I am not sure this is plausible, or if so, is it the best
approach ?
Is there a version of lex available free which handles 8 bit or multi
byte characters and in particular japanese character sets ? I am sure
there must be some considering unicode etc ...
If anyone can give me useful pointers, it will be great
Later
Shailendra Abhyankar
Shailendra Abhyankar e-mail : shailu@trddc.ernet.in
Tata Consultancy Services Tel : 91-212-622809
1 Mangaldas Rd
Pune 411001
India
[Flex handles 8-bit characters without trouble, but I don't offhand know
of scanner generators that handle multibyte or wide characters. -John]
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.