Re: Separation of parsing from AST construction and attribute evaluation

Chris F Clark <>
4 Oct 2004 00:42:26 -0400

          From comp.compilers

Related articles
Separation of parsing from AST construction and attribute evaluation (2004-10-02)
Re: Separation of parsing from AST construction and attribute evaluati (Richard Pennington) (2004-10-02)
Re: Separation of parsing from AST construction and attribute evaluati (Chris F Clark) (2004-10-04)
RE: Separation of parsing from AST construction and attribute evaluati (Quinn Tyler Jackson) (2004-10-09)
Re: Separation of parsing from AST construction and attribute evaluati (Rodney M. Bates) (2004-10-09)
| List of all articles for this month |

From: Chris F Clark <>
Newsgroups: comp.compilers
Date: 4 Oct 2004 00:42:26 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 04-10-006
Keywords: parse, practice
Posted-Date: 04 Oct 2004 00:42:26 EDT

Raplph Boland wrote:
> I am designing a parser generator tool and I want it to work such that
> the specification for parsing an input language is entirely
> implementation language independant. Thus the same specification may
> be used to construct a parser in several different implementation
> languages.

Over the years, this has been tried several times. Most of the times,
it has been done by simply having the parser generate an "AST" and the
user can attach the back-end portion to the AST in an implementation
language. I think Paul Mann's parsers (e.g. LALR, Autom) used this
approach quite effectively. I believe Visual Parse++ does the same
thing. The parser code has only one small amount of anotation
associatated with it--the type of AST node to construct (and if you
are willing to live with a parse tree for an AST, there is no notation
needed at all). The disadvantage of this approach is that one is
usaually limited in the kind of AST's one can construct. As one
starts making the AST construction mechanism increasingly powerful,
one finds that it tends to become a full-grown programming language.

However, more importantly, I don't think that separating the grammar
from the implementation semantics (in terms of implementation
language) is that significant an issue. I do think being able to
define certain parts of the semantics in distinct files (possibly
using distinct notations) is a good thing. For example, Eli has
special notations that handle different parts of the semantic
specification, such as OIL. If I recall correctly, the ASF+SDF kits
also have multiple layers. The Cocktail toolkit also uses mutiple
input files to describe different aspects. However, I think in each
of these approaches, it was still possible (and often necessary) to
have some amount of the semantic specification still co-located
(i.e. intermixed) with the grammar.

The reason why that is the case is that the grammar describes many
aspects of the input language quite succinctly. If one tries to
extract the relevant information into a separate input, one finds that
one comes close to duplicating the grammar.

On this particular topic, I would like to mention the concept of
"modular attribute grammars" (that isn't quite the right paper title).
This was an important advance in the field. The idea is that the
copy-rules and other boilerplate needed to move attributes around in
the tree from their points of definition to their points of use could
be inferred by the tool. This allowed one to only attach actions at
key points in the grammar, with the tool inferring the missing
actions. More importantly, it allowed the specification of attributes
in an OO-like way, where the grammar could have attributes added
incrementally in a piece-meal fashion.

The much loved/hated visitor pattern is another approach to the same
piece-meal semantic definition problem. I find it quite instructive
to see that two of the most popular visitor generator tools (JJtree
and jtb) are integrated with the underlying grammars.

Ok, why do I think that being able to separate the semantics into a
separate file is not that important? The answer is that most people
are more likely to switch parser generator tools than they are to
switch implementation languages. In fact, most parser generation
tools get rather intimately tied to one specific programming language
implementation and the tools lifetime is usually relatively short.
The problem being that there are a continual stream of "new" parser
generators on the market, although most of them don't last past the
individual implementors infatuation with them. Part of the reason for
this stream is that it is simply too easy to write one's own parser
generator. The result of this, is that quite often a new project will
be have its own parser generator with it's own idiosyncratic BNF
dialect written and that will be tied to one implementation.

So, what is needed is not the ability to separate the implementation
lanugage from the grammar, but rather a parser generator than can
handle the various idiosyncratic parser notations (and generate the
various output API's). I'm hoping that in the 3.x time frame Yacc++
will get a little closer to doing that, by supporting some of the
other common dialects, such as PCCTS and JavaCC.

BTW, if someone is interested in that project, the resulting version
will be released open-source (under the GPL), so you can contact me
and I can get you some preliminary code to work with under the GPL.
Note, we will require copyright assignment back of the resulting code
as we will also be releasing non-open-source code from the same code


Chris Clark Internet :
Compiler Resources, Inc. Web Site :
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.