Related articles |
---|
Ada95 to Ada2005 parser - currently using lex/yacc - problem with Unic twometresteve@googlemail.com (2006-12-21) |
Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with haberg@math.su.se (2006-12-22) |
Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with tricky.tommy@comhem.se (Tommy Nordgren) (2007-03-08) |
Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with tom@infoether.com (Tom Copeland) (2007-03-16) |
From: | Tom Copeland <tom@infoether.com> |
Newsgroups: | comp.compilers |
Date: | 16 Mar 2007 03:16:42 -0400 |
Organization: | Compilers Central |
References: | 06-12-082 07-03-036 |
Keywords: | i18n, Java |
Posted-Date: | 16 Mar 2007 03:16:42 EDT |
On Thu, 2007-03-08 at 19:54 -0500, Tommy Nordgren wrote:
> ANTLR supports unicode, but one point to consider with ANY tool, is
> that you will need an module that supports converting the input text
> files to canonical utf-16.
JavaCC also handles Unicode characters; for example, this would tokenize
and optional minus sign followed by the Unicode code points for "degrees
in Fahrenheit" and "degrees in Celsius" followed by a couple of digits:
TOKEN : {
<FAHRENHEIT_TEMPERATURE : (["-"])? <DIGITS> " \u2109">
| <CELSIUS_TEMPERATURE : (["-"])? <DIGITS> " \u2103">
| <#DIGITS : ["0"-"9"](["0"-"9"])*>
}
JavaCC doesn't yet handle supplementary characters (those outside the
Basic Multilingual Plane). But that's on our radar, so we shall see...
Yours,
Tom
Return to the
comp.compilers page.
Search the
comp.compilers archives again.