Related articles |
---|
[22 earlier articles] |
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-22) |
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-22) |
Re: Internal Representation of Strings bartc@freeuk.com (Bartc) (2009-02-22) |
Re: Internal Representation of Strings scooter.phd@gmail.com (Scott Michel) (2009-02-22) |
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-02-23) |
Re: Internal Representation of Strings marcov@stack.nl (Marco van de Voort) (2009-02-23) |
Re: Internal Representation of Strings haberg_20080406@math.su.se (Hans Aberg) (2009-02-23) |
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-24) |
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-24) |
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-25) |
Re: Internal Representation of Strings armelasselin@hotmail.com (Armel) (2009-02-26) |
Re: Internal Representation of Strings marcov@stack.nl (Marco van de Voort) (2009-02-27) |
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-28) |
[5 later articles] |
From: | Hans Aberg <haberg_20080406@math.su.se> |
Newsgroups: | comp.compilers |
Date: | Mon, 23 Feb 2009 20:39:33 +0100 |
Organization: | Aioe.org NNTP Server |
References: | 09-02-051 09-02-077 09-02-092 09-02-104 09-02-112 |
Keywords: | i18n |
Posted-Date: | 24 Feb 2009 07:50:36 EST |
Hans-Peter Diettrich wrote:
>> in general, UTF-8 takes less space than UTF-16 (and mixes much better with
>> code designed for ASCII), but some many languages like UTF-16 more
>> potentially because it works better when being treated as an array.
>
> This IMO is a typical misconception of English-only speakers, which have
> caused a lot of trouble in the evolution of programming languages :-(
I might add: the UTF encodings were not designed with compression issues
in mind. If space is an issue, use a compression algorithm instead,
because it will be more efficient. And only use UTF-16 for backwards
compatibility (libraries and other programs you must use uses it); UTF-8
avoids the endian issue, as one nowadays mostly agrees on how to sort
out the bits of a byte. (The BOM used in some UTF-16 code to sort out
endianess is not a part of the Unicode standard.) UTF-32 might be good
in cases were variable length or perhaps speed is needed (like
internally in programs); but this requires endianess to be sorted oit
between platforms.
Hans Aberg
Return to the
comp.compilers page.
Search the
comp.compilers archives again.