|[2 earlier articles]|
|Re: UCS Identifiers and compilers firstname.lastname@example.org (Dmitry A. Kazakov) (2008-12-11)|
|Re: UCS Identifiers and compilers email@example.com (James Harris) (2008-12-11)|
|Re: UCS Identifiers and compilers firstname.lastname@example.org (Marco van de Voort) (2008-12-11)|
|Re: UCS Identifiers and compilers email@example.com (Ira Baxter) (2008-12-11)|
|Re: UCS Identifiers and compilers firstname.lastname@example.org (Ray Dillinger) (2008-12-11)|
|Re: UCS Identifiers and compilers cfc@shell01.TheWorld.com (Chris F Clark) (2008-12-11)|
|Re: UCS Identifiers and compilers email@example.com (Bartc) (2008-12-12)|
|Re: UCS Identifiers and compilers firstname.lastname@example.org (Mike Austin) (2008-12-12)|
|Date:||Fri, 12 Dec 2008 14:39:26 GMT|
|Posted-Date:||12 Dec 2008 10:34:18 EST|
"William Clodius" <email@example.com> wrote in message
> As a hobby I have started work on a language design and one of the
> issues that has come to concern me is the impact on the usefulness and
> complexity of implementation is the incorporation of UCS/Unicode into
> the language, particularly in identifiers.
> 1. Do many of your users make use of letters outside the ASCII/Latin-1
My (very few) users were based in Europe, and I felt it important that
they be able to use any special characters in their language. So these
were marked as being alphanumeric in a table of the 256 codes.
This allowed identifiers to use the special characters, although
keywords were in English, with the possibility of using macros to
redefine them. But when I looked their source code, I don't remember
seeing these being used. Maybe they were used to the restrictions of
other languages, or maybe I should have told them about the feature...
(Allowing the end-users to use special characters in their data, and
for filenames, and so on, was another matter with it's own problems.)
This was a few years ago and having the possibility of only two 8-bit
character sets made things very easy. However deciding whether a
16-bit or wider character is suitable for an identifier or not I don't
think is too challenging.
Where the identifiers need to be used outside the language (for
linking for example), then that's also a minor problem if the other
system is more restricted. But my language was self-contained.
> 2. What are the most useful development environments in terms of dealing
> with extended character sets?
> 3. Visually how well do alternative character sets mesh with a language
> with ASCII keywords and left to right, up and down display, typical of
> most programming languages?
If a source file is considered a stream of 16-bit character codes,
then the visual representation is irrelevant (or at least, someone
> 4. How does the incorporation of the larger character sets affect your
> lexical analysis? Is hash table efficiency affected? Do you have to deal
> with case/accent independence
In my case support consisted of an entry in a table. Very simple. But
it means accented versions of 'A' for example were all considered
different. Better however than treating upper and lower case
differently which usually seems the case.
Return to the
Search the comp.compilers archives again.