Re: Evaluation of parser generators

Terence Parr <>
15 Dec 1997 22:02:14 -0500

          From comp.compilers

Related articles
Evaluation of parser generators (1997-12-10)
Re: Evaluation of parser generators (Basim Kadhim) (1997-12-12)
Re: Evaluation of parser generators (Scott Stanchfield) (1997-12-12)
Re: Evaluation of parser generators (1997-12-14)
Re: Evaluation of parser generators (1997-12-14)
Re: Evaluation of parser generators cwilson@Xenon.Stanford.EDU (1997-12-15)
Re: Evaluation of parser generators (Terence Parr) (1997-12-15)
Re: Evaluation of parser generators (Dave Love) (1997-12-23)
| List of all articles for this month |

From: Terence Parr <>
Newsgroups: comp.compilers
Date: 15 Dec 1997 22:02:14 -0500
Organization: MageLang Insitute
References: 97-12-061
Keywords: tools, PCCTS

> (Wclodius) spake:
> 1. PCCTS 1.3x - well documented and widely used, but much of the
> technology has been superceded by ANTLR 2.2x. Output in C or C++.

> 2. ANTLR 2.2x - not as well documented or widely used, but a more
> elegant implementation than PCCTS 1.3x. It appears to have been
> developed to allow code generation in more than the original output
> language, Java, but that capability seems not to have been used by
> anyone at this time.

Couple of notes from the primary author of both. :)Both PCCTS and
ANTLR are completely public-domain. No copyright on the source or
restrictions of any kind. :)

SORCERER for PCCTS and ANTLR itself 2.xx does very nice tree walking.
For example, this grammar parses trees with INTs as leaves and MULT as
subtree roots.

                : #( MULT expr expr )
                | INT

There is support for tree rewriting (e.g., you can easily do symbolic
differentiation of polynomials) etc...

With regards, to lexical analysis, ANTLR 2.xx generates
predicated-LL(k) lexers (recursive descent). So, you can use k or
arbitrary lookahead and semantic information to guide the parse. You
can easily parse really nasty stuff like FORTRAN. The new approach is
not without cost: the common method of just listing a bunch of
expressions doesn't work too well due to common left-prefixes (such as
INT vs FLOAT). However, the benefits of strength and speed are
tremendous. Oh, ANTLR 2.xx cannot do UNICODE yet. The 2.20 version
(in beta) includes very nice text manipulation

features for token defs. For example, the <br> HTML tag might be
recognized as follows:

BR : '<'! "br" '>'! ; // discard < and >

The '!' operator indicates that the <,> should be on
rules too. Labeling a rule reference from within the lexer allows you
to access the text matched for just that piece of the input.

One last thing concering lexing: you can pass heterogeneous token
objects back to the parser (i.e., different types of objects for the
various HTML tags).

ANTLR 2.20 (out in a few weeks) supports grammar inheritance so that
you can define a new grammar as it differs from an old grammar. :)
Just override the appropriate rules!

ANTLR generalizes the notion of recognition in the sense that you use
the same technique for specification and parsing for chars, tokens, or
tree nodes.

C++ output from ANTLR 2.xx is at least 6 months away I'd say. I have
to convince John Lilley to take time away from making money. ;)

Hope this helps with your survey,
Terence Parr
MageLang Institute

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.