Related articles |
---|
[2 earlier articles] |
Re: UCS Identifiers and compilers mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2008-12-11) |
Re: UCS Identifiers and compilers james.harris.1@googlemail.com (James Harris) (2008-12-11) |
Re: UCS Identifiers and compilers marcov@stack.nl (Marco van de Voort) (2008-12-11) |
Re: UCS Identifiers and compilers idbaxter@semdesigns.com (Ira Baxter) (2008-12-11) |
Re: UCS Identifiers and compilers bear@sonic.net (Ray Dillinger) (2008-12-11) |
Re: UCS Identifiers and compilers cfc@shell01.TheWorld.com (Chris F Clark) (2008-12-11) |
Re: UCS Identifiers and compilers bc@freeuk.com (Bartc) (2008-12-12) |
Re: UCS Identifiers and compilers mike@mike-austin.com (Mike Austin) (2008-12-12) |
From: | "Bartc" <bc@freeuk.com> |
Newsgroups: | comp.compilers |
Date: | Fri, 12 Dec 2008 14:39:26 GMT |
Organization: | Compilers Central |
References: | 08-12-061 |
Keywords: | i18n |
Posted-Date: | 12 Dec 2008 10:34:18 EST |
"William Clodius" <wclodius@los-alamos.net> wrote in message
> As a hobby I have started work on a language design and one of the
> issues that has come to concern me is the impact on the usefulness and
> complexity of implementation is the incorporation of UCS/Unicode into
> the language, particularly in identifiers.
> 1. Do many of your users make use of letters outside the ASCII/Latin-1
> sets?
My (very few) users were based in Europe, and I felt it important that
they be able to use any special characters in their language. So these
were marked as being alphanumeric in a table of the 256 codes.
This allowed identifiers to use the special characters, although
keywords were in English, with the possibility of using macros to
redefine them. But when I looked their source code, I don't remember
seeing these being used. Maybe they were used to the restrictions of
other languages, or maybe I should have told them about the feature...
(Allowing the end-users to use special characters in their data, and
for filenames, and so on, was another matter with it's own problems.)
This was a few years ago and having the possibility of only two 8-bit
character sets made things very easy. However deciding whether a
16-bit or wider character is suitable for an identifier or not I don't
think is too challenging.
Where the identifiers need to be used outside the language (for
linking for example), then that's also a minor problem if the other
system is more restricted. But my language was self-contained.
> 2. What are the most useful development environments in terms of dealing
> with extended character sets?
>
> 3. Visually how well do alternative character sets mesh with a language
> with ASCII keywords and left to right, up and down display, typical of
> most programming languages?
If a source file is considered a stream of 16-bit character codes,
then the visual representation is irrelevant (or at least, someone
else's headache).
> 4. How does the incorporation of the larger character sets affect your
> lexical analysis? Is hash table efficiency affected? Do you have to deal
> with case/accent independence
In my case support consisted of an entry in a table. Very simple. But
it means accented versions of 'A' for example were all considered
different. Better however than treating upper and lower case
differently which usually seems the case.
--
Bartc
Return to the
comp.compilers page.
Search the
comp.compilers archives again.