Re: compiler for Chinese development language

henry@spsystems.net (Henry Spencer)
27 Oct 2005 23:24:35 -0400

          From comp.compilers

Related articles
[20 earlier articles]
Re: compiler for Chinese development language DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-10-23)
Re: compiler for Chinese development language DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-10-23)
Re: compiler for Chinese development language Robert@Knighten.org (Robert Knighten) (2005-10-26)
Re: compiler for Chinese development language nmh@t3x.org (Nils M Holm) (2005-10-26)
Re: compiler for Chinese development language owong@castortech.com (Oliver Wong) (2005-10-26)
Re: compiler for Chinese development language owong@castortech.com (Oliver Wong) (2005-10-26)
Re: compiler for Chinese development language henry@spsystems.net (2005-10-27)
Re: compiler for Chinese development language henry@spsystems.net (2005-10-27)
Re: compiler for Chinese development language gah@ugcs.caltech.edu (glen herrmannsfeldt) (2005-10-28)
Re: compiler for Chinese development language choudhary@indicybers.net (Abhishek Choudhary) (2006-01-12)
| List of all articles for this month |

From: henry@spsystems.net (Henry Spencer)
Newsgroups: comp.compilers
Date: 27 Oct 2005 23:24:35 -0400
Organization: SP Systems, Toronto, Canada
References: 05-10-085 05-10-122 05-10-146 05-10-173
Keywords: i18n, comment
Posted-Date: 27 Oct 2005 23:24:35 EDT

Oliver Wong <owong@castortech.com> wrote:
>...It's not too bad to memorize an alphabet. With English, that's
>only 52 characters (you have to learn both the uppercase and lowercase
>version of every character, as they differ significantly)...


The number is actually a bit higher than that, because there are a few
letters which vary in basic shape from font to font; English speakers
are so used to this that they seldom notice it, but newcomers have to
learn the variations as separate forms. In italics, "f" grows a tail,
"Q"'s tail grows to almost an underline, and "a" is a completely
different shape (loop with a slight tail, rather than low loop with a
roof over it); in Helvetica and a lot of other sans-serif fonts, "g"'s
tail is a line rather than a loop. Printer and terminal(-emulation)
fonts pick one or the other almost at random.


(It's easy to dismiss the changes in tails in particular as trivial, but
differences between letters often are no bigger -- "j" is just "i" with
a tail, for example.)


>Even the Japanese Katakana alphabet has around 100 characters...
> Incidentally, the Japanese Katakana alphabet has a completely
>unambiguous pronounciation: Each chararacter represents one syllable...


The two facts are, of course, connected: you need more characters to
give each syllable a unique one.


By the way, a linguistic nitpick: technically an alphabet is a writing
system with (approximately) one character per sound, like the English one,
and Katakana is a syllabary, not an alphabet. People who independently
invent writing systems generally invent either syllabaries or ideographic
systems; it appears that the alphabet concept was invented just once, by
some obscure neighbors of the Phoenicians, and all other alphabets derive
at least inspiration from theirs.
--
spsystems.net is temporarily off the air; | Henry Spencer
mail to henry at zoo.utoronto.ca instead. | henry@spsystems.net
[Fascinating though this discussion is, it's veered away from compilers, so
unless someone can tell us about compiling kana into kanji or something,
this thread is at an end. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.