Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with Unicode!

Tom Copeland <tom@infoether.com>
16 Mar 2007 03:16:42 -0400

          From comp.compilers

Related articles
Ada95 to Ada2005 parser - currently using lex/yacc - problem with Unic twometresteve@googlemail.com (2006-12-21)
Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with haberg@math.su.se (2006-12-22)
Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with tricky.tommy@comhem.se (Tommy Nordgren) (2007-03-08)
Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with tom@infoether.com (Tom Copeland) (2007-03-16)
| List of all articles for this month |
From: Tom Copeland <tom@infoether.com>
Newsgroups: comp.compilers
Date: 16 Mar 2007 03:16:42 -0400
Organization: Compilers Central
References: 06-12-082 07-03-036
Keywords: i18n, Java
Posted-Date: 16 Mar 2007 03:16:42 EDT

On Thu, 2007-03-08 at 19:54 -0500, Tommy Nordgren wrote:
> ANTLR supports unicode, but one point to consider with ANY tool, is
> that you will need an module that supports converting the input text
> files to canonical utf-16.


JavaCC also handles Unicode characters; for example, this would tokenize
and optional minus sign followed by the Unicode code points for "degrees
in Fahrenheit" and "degrees in Celsius" followed by a couple of digits:


TOKEN : {
    <FAHRENHEIT_TEMPERATURE : (["-"])? <DIGITS> " \u2109">
    | <CELSIUS_TEMPERATURE : (["-"])? <DIGITS> " \u2103">
    | <#DIGITS : ["0"-"9"](["0"-"9"])*>
}


JavaCC doesn't yet handle supplementary characters (those outside the
Basic Multilingual Plane). But that's on our radar, so we shall see...


Yours,


Tom



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.