Re: UCS Identifiers and compilers

Chris F Clark <cfc@shell01.TheWorld.com>
Thu, 11 Dec 2008 23:54:00 -0500

From comp.compilers

Related articles
UCS Identifiers and compilers wclodius@los-alamos.net (2008-12-10)
Re: UCS Identifiers and compilers DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-12-11)
Re: UCS Identifiers and compilers mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2008-12-11)
Re: UCS Identifiers and compilers james.harris.1@googlemail.com (James Harris) (2008-12-11)
Re: UCS Identifiers and compilers marcov@stack.nl (Marco van de Voort) (2008-12-11)
Re: UCS Identifiers and compilers idbaxter@semdesigns.com (Ira Baxter) (2008-12-11)
Re: UCS Identifiers and compilers bear@sonic.net (Ray Dillinger) (2008-12-11)
*Re: UCS Identifiers and compilers cfc@shell01.TheWorld.com (Chris F Clark)* (2008-12-11)**
Re: UCS Identifiers and compilers bc@freeuk.com (Bartc) (2008-12-12)
Re: UCS Identifiers and compilers mike@mike-austin.com (Mike Austin) (2008-12-12)

| List of all articles for this month |

From:	Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups:	comp.compilers
Date:	Thu, 11 Dec 2008 23:54:00 -0500
Organization:	The World Public Access UNIX, Brookline, MA
References:	08-12-061
Keywords:	i18n
Posted-Date:	12 Dec 2008 10:33:44 EST

wclodius@los-alamos.net (William Clodius) writes:

> As a hobby I have started work on a language design and one of the
> issues that has come to concern me is the impact on the usefulness and
> complexity of implementation is the incorporation of UCS/Unicode into
> the language, particularly in identifiers.
>
> 1. Do many of your users make use of letters outside the ASCII/Latin-1
> sets?

We have one major Yacc++ customer that has a series of languages that
support Unicode identifiers. Some of their languages have both case
sensitive and case insensitive features in the same language. My
experience relates primarily to supporting them.

> 3. Visually how well do alternative character sets mesh with a language
> with ASCII keywords and left to right, up and down display, typical of
> most programming languages? eg. how well do scripts with ideographs,
> context dependent glyphs for the same character, and alternative saptail
> ordering work, or character sets with characters with glyphs similar to
> those used for ASCII (the l vs 1 and O vs. 0 problem multiplied)

The glyphs that look like ASCII are a definite problem and that is
made worse if the glyphs that look like ASCII characters have
different properties. In particular, a fair amount of effort went
into dealing with the Turkish character that is an i without the dot.
Apparently, there is no capital from of this letter (or it shares the
captial with some other letter) and the system toupper/tolower
routines did not deal consistently with it across locales. As a
result, we had to take care to make certain that we used a consistent
approach when calling those routines to make certain we had not
changed our locale between calls. The difficulty being that some
tables were built at the time the compiler was built (and thus under
one locale), which may not be the same as the locale the user has
specified when running the compiler.

Hope this helps,
-Chris

******************************************************************************
Chris Clark Internet: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. or: compres@world.std.com
23 Bailey Rd Web Site: http://world.std.com/~compres
Berlin, MA 01503 voice: (508) 435-5016
USA fax: (978) 838-0263 (24 hours)
------------------------------------------------------------------------------

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: UCS Identifiers and compilers

Chris F Clark <cfc@shell01.TheWorld.com>Thu, 11 Dec 2008 23:54:00 -0500

Chris F Clark <cfc@shell01.TheWorld.com>
Thu, 11 Dec 2008 23:54:00 -0500