Related articles |
---|
A question about lexer portability in C ? frederic.guerin@sympatico.ca (Frederic Guerin) (1997-09-23) |
Re: A question about lexer portability in C ? cfc@world.std.com (1997-09-24) |
Re: A question about lexer portability in C ? henry@zoo.toronto.edu (Henry Spencer) (1997-09-28) |
From: | Henry Spencer <henry@zoo.toronto.edu> |
Newsgroups: | comp.compilers |
Date: | 28 Sep 1997 23:17:03 -0400 |
Organization: | SP Systems, Toronto |
References: | 97-09-090 |
Keywords: | lex, i18n |
Frederic Guerin <frederic.guerin@sympatico.ca> wrote:
>The question is : Can I fix this table at compile time or do I need to
>build it at run time so as to make sure that the correct codes will be
>assigned to the correct characters ?
In general, you must build it at run time. Different users, even on a
single system, may be using different character sets, with different
ideas about what constitutes (say) an alphabetic character. Except in
unusually favorable environments, there's just no way to pre-build a
single copy of the code and have it always get things right.
>...May I assume that all character sets used
>over the world are superset of the ANSI one ( with identical character
>code ) ?
Unfortunately, no. First, as our moderator mentioned, there is still
substantial use of totally non-ASCII character sets like EBCDIC. Second,
there is still substantial use of other ISO646-derived character sets
which resemble ASCII but are not supersets of it -- for example, some of
them have extra alphabetic characters where ASCII puts characters like "`"
and "[" and "|". Third, even when character sets are exact supersets of
ASCII, that doesn't mean you can just ignore the non-ASCII part, because
non-English users in particular will want to put non-ASCII alphabetics
into identifiers etc.
--
| Henry Spencer
| henry@zoo.toronto.edu
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.