Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with Unicode!

Tommy Nordgren <tricky.tommy@comhem.se>
8 Mar 2007 19:54:50 -0500

          From comp.compilers

Related articles
Ada95 to Ada2005 parser - currently using lex/yacc - problem with Unic twometresteve@googlemail.com (2006-12-21)
Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with haberg@math.su.se (2006-12-22)
Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with tricky.tommy@comhem.se (Tommy Nordgren) (2007-03-08)
Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with tom@infoether.com (Tom Copeland) (2007-03-16)
| List of all articles for this month |
From: Tommy Nordgren <tricky.tommy@comhem.se>
Newsgroups: comp.compilers
Date: 8 Mar 2007 19:54:50 -0500
Organization: tommynordgren.com
References: 06-12-082
Keywords: Ada, parse
Posted-Date: 08 Mar 2007 19:54:50 EST

twometresteve@googlemail.com
wrote:


> Hi there,
>
> I have a tool that parses Ada95 code and am investigating the
> possibilty of updating it to support Ada2005.
>
> The biggest problem I am having at the moment is working out how to
> cope with Unicode characters. ...
> [The character set issues happen in the lexer which lex generates. A
> yacc parser sees only tokens. The question of unicode lexers has come
> up frequently over the past decade. See for example
> http://compilers.iecc.com/comparch/article/98-01-046 -John]


I suggest that you rewrite your grammar using the ANTLR tool
(www.antlr.org) ANTLR is quite powerful, and a specification file can
specify lexers, parsers, ant tree parsers/transformers.


Parsers can be generated in Java, C++, c# and python.
(This applies to version 2.7.6)
I don't know if the later 3.0 series includes code generators for other
languages than Java, since I'm currently using 2.7.6


ANTLR is written in Java, by the way.


ANTLR supports unicode, but one point to consider with ANY tool, is
that you will need an module that supports converting the input text
files to canonical utf-16.


The one thing to beware of when switching from yacc/bison, is that
ANTLR doesn't support left-recursive rules. EBNF notation, with embedded
code fragments, can be used instead.


If you are interested, ANTLR's primary architect, Terence Parr, is
currently writing a book about ANTLR 3.0 that will be published later
this year.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.