|UCS Identifiers and compilers email@example.com (2008-12-10)|
|Re: UCS Identifiers and compilers DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-12-11)|
|Re: UCS Identifiers and compilers firstname.lastname@example.org (Dmitry A. Kazakov) (2008-12-11)|
|Re: UCS Identifiers and compilers email@example.com (James Harris) (2008-12-11)|
|Re: UCS Identifiers and compilers firstname.lastname@example.org (Marco van de Voort) (2008-12-11)|
|Re: UCS Identifiers and compilers email@example.com (Ira Baxter) (2008-12-11)|
|Re: UCS Identifiers and compilers firstname.lastname@example.org (Ray Dillinger) (2008-12-11)|
|Re: UCS Identifiers and compilers cfc@shell01.TheWorld.com (Chris F Clark) (2008-12-11)|
|[2 later articles]|
|From:||Hans-Peter Diettrich <DrDiettrich1@aol.com>|
|Date:||Thu, 11 Dec 2008 12:17:27 +0100|
|Posted-Date:||11 Dec 2008 07:56:02 EST|
William Clodius schrieb:
> 4. How does the incorporation of the larger character sets affect your
> lexical analysis? Is hash table efficiency affected? Do you have to deal
> with case/accent independence and if so how useful are the UCS
> recommendations for languages?
IMO a compiler is not a text processor, and consequently should not be
burdened with natural language conventions and textual representation
oddities. Text handling (as opposed to string handling) should be the
task of the application coder, not of the compiler or language designer.
Handling Unicode in string literals is not a special problem, when the
source code is already stored in UCS/UTF. The use of escape sequences in
string literals may deserve some consideration, a concatenation of
"plain" text and control character sequences may be preferable.
Identifiers also should be no problem, in detail when they are case
sensitive, so that a binary comparison is sufficient for the lookup of
APL used glyphs for keywords, what in those days required appropriately
equipped systems, but nowadays it would be more practical and could
simplify the lexers. I'm also waiting for the first Chinese programming
language, with glyphs for keywords; what will happen then to the shared
code, which currently still is written in traditional programming
languages and ASCII characters?
AFAIK the Russian attempts, to translate keywords into Cyrillic, were
not really successful - but mostly due to the longer words, not
because of readability. As with Chinese glyphs, it should be possible
to parse "nationalized" source code, and to reproduce the AST in a
different translation, properly retaining comments. Perhaps future
development systems will include such a feature, and a dictionary for
commonly used identifier names and abbreviations?
Just my 0,02$ <BG>
Return to the
Search the comp.compilers archives again.