Related articles |
---|
UCS Identifiers and compilers wclodius@los-alamos.net (2008-12-10) |
Re: UCS Identifiers and compilers DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-12-11) |
Re: UCS Identifiers and compilers mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2008-12-11) |
Re: UCS Identifiers and compilers james.harris.1@googlemail.com (James Harris) (2008-12-11) |
Re: UCS Identifiers and compilers marcov@stack.nl (Marco van de Voort) (2008-12-11) |
Re: UCS Identifiers and compilers idbaxter@semdesigns.com (Ira Baxter) (2008-12-11) |
Re: UCS Identifiers and compilers bear@sonic.net (Ray Dillinger) (2008-12-11) |
Re: UCS Identifiers and compilers cfc@shell01.TheWorld.com (Chris F Clark) (2008-12-11) |
Re: UCS Identifiers and compilers bc@freeuk.com (Bartc) (2008-12-12) |
Re: UCS Identifiers and compilers mike@mike-austin.com (Mike Austin) (2008-12-12) |
From: | Chris F Clark <cfc@shell01.TheWorld.com> |
Newsgroups: | comp.compilers |
Date: | Thu, 11 Dec 2008 23:54:00 -0500 |
Organization: | The World Public Access UNIX, Brookline, MA |
References: | 08-12-061 |
Keywords: | i18n |
Posted-Date: | 12 Dec 2008 10:33:44 EST |
wclodius@los-alamos.net (William Clodius) writes:
> As a hobby I have started work on a language design and one of the
> issues that has come to concern me is the impact on the usefulness and
> complexity of implementation is the incorporation of UCS/Unicode into
> the language, particularly in identifiers.
>
> 1. Do many of your users make use of letters outside the ASCII/Latin-1
> sets?
We have one major Yacc++ customer that has a series of languages that
support Unicode identifiers. Some of their languages have both case
sensitive and case insensitive features in the same language. My
experience relates primarily to supporting them.
> 3. Visually how well do alternative character sets mesh with a language
> with ASCII keywords and left to right, up and down display, typical of
> most programming languages? eg. how well do scripts with ideographs,
> context dependent glyphs for the same character, and alternative saptail
> ordering work, or character sets with characters with glyphs similar to
> those used for ASCII (the l vs 1 and O vs. 0 problem multiplied)
The glyphs that look like ASCII are a definite problem and that is
made worse if the glyphs that look like ASCII characters have
different properties. In particular, a fair amount of effort went
into dealing with the Turkish character that is an i without the dot.
Apparently, there is no capital from of this letter (or it shares the
captial with some other letter) and the system toupper/tolower
routines did not deal consistently with it across locales. As a
result, we had to take care to make certain that we used a consistent
approach when calling those routines to make certain we had not
changed our locale between calls. The difficulty being that some
tables were built at the time the compiler was built (and thus under
one locale), which may not be the same as the locale the user has
specified when running the compiler.
Hope this helps,
-Chris
******************************************************************************
Chris Clark Internet: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. or: compres@world.std.com
23 Bailey Rd Web Site: http://world.std.com/~compres
Berlin, MA 01503 voice: (508) 435-5016
USA fax: (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
Return to the
comp.compilers page.
Search the
comp.compilers archives again.