|A question about lexer portability in C ? email@example.com (Frederic Guerin) (1997-09-23)|
|Re: A question about lexer portability in C ? firstname.lastname@example.org (1997-09-24)|
|Re: A question about lexer portability in C ? email@example.com (Henry Spencer) (1997-09-28)|
|From:||Frederic Guerin <firstname.lastname@example.org>|
|Date:||23 Sep 1997 23:45:29 -0400|
|Keywords:||lex, i18n, comment|
I need to write a portable lexer in C for a Language X. A good way to
know if a given character is for example an operator of the language
is to look in a boolean table of which the element at index position N
tells me if the character with code N is or is not part of the
The question is : Can I fix this table at compile time or do I need to
build it at run time so as to make sure that the correct codes will be
assigned to the correct characters ?
The previous question can be naturally answered if the anser to the
following is Yes : Let the characters used by the language X be part
of the ANSI character set. May I assume that all character sets used
over the world are superset of the ANSI one ( with identical character
code ) ?
If No, are parser/lexer generating tools safe in this regard ?
Any comment or pointer appreciated,
[The answers to all those questions is NO, since there are still
plenty of computers that use EBCDIC. The best approach I've seen is
to generate your lexer using ASCII, then at runtime create a table
that maps your local character set to ASCII and translate each
character before looking it up in the scanner's tables. For character
sets larger than ASCII, you can usually get away by translating all
the extra characters to a default "other" character for lexing
Return to the
Search the comp.compilers archives again.