RE: Separation of parsing from AST construction and attribute evaluation

Quinn Tyler Jackson <quinn-j@shaw.ca>
9 Oct 2004 22:29:35 -0400

          From comp.compilers

Related articles
RE: Separation of parsing from AST construction and attribute evaluati quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-10-02)
Re: Separation of parsing from AST construction and attribute evaluati cfc@shell01.TheWorld.com (Chris F Clark) (2004-10-04)
RE: Separation of parsing from AST construction and attribute evaluati quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-10-09)
| List of all articles for this month |

From: Quinn Tyler Jackson <quinn-j@shaw.ca>
Newsgroups: comp.compilers
Date: 9 Oct 2004 22:29:35 -0400
Organization: Compilers Central
References: 04-10-043
Keywords: parse, practice
Posted-Date: 09 Oct 2004 22:29:35 EDT

Chris F Clark said:


> Over the years, this has been tried several times. Most of the times,
> it has been done by simply having the parser generate an "AST" and the
> user can attach the back-end portion to the AST in an implementation
> language. I think Paul Mann's parsers (e.g. LALR, Autom) used this
> approach quite effectively. I believe Visual Parse++ does the same
> thing. The parser code has only one small amount of anotation
> associatated with it--the type of AST node to construct (and if you
> are willing to live with a parse tree for an AST, there is no notation
> needed at all). The disadvantage of this approach is that one is
> usaually limited in the kind of AST's one can construct. As one
> starts making the AST construction mechanism increasingly powerful,
> one finds that it tends to become a full-grown programming language.


Trees are a side-effect of parses done with the meta-s engine. Tree
directives can be added to productions to massage how the tree is
generated, however. Also, with the Lua code in reductions, nodes can
be manipulated explicitly if the productions require it.


This is particularly useful in expression parsing because $-grammars
cannot have left recursion, and right-recursive expression grammars
produce horrible trees with many intermediate nodes that have only one
child. The "#refine(new_name_for_node)" directive "solves" this by
remove intermediate nodes between the bottom of the branch and the
production where the #refine(n) occurs, producing trees that don't
have long chains of meaningless intermediate garbage in them. For
example, here's a snippet from a simple expression grammar:


expr ::= expr_add [#? "," #? expr #refine(expr_seq)]; // expr
expr_add ::= expr_mult #refine(expr) [#? $::<-('[-+]') #? expr_add
#refine(expr_add)]; // expr_add
expr_mult ::= expr_pow #refine(expr) [#? $::<-('[*/]') #? expr_mult
#refine(expr_mult)]; // expr_mult
expr_pow ::= expr_simple #refine(expr) [#? $::<-("^") #? expr_pow
#refine(expr_pow)]; // expr_pow
expr_simple ::= expr_paren | expr_bind | expr_integer | expr_variable; //
expr_simple
expr_paren ::= "(" #? expr #? ")"; // expr_paren
expr_bind ::= "{" #? $name(expr_var) #? ":" #? expr #? "}"; // expr_bind


In the case of expr_add -- suppose the parse actually descends into
expr_pow. Instead of a tree with this:


expr
expr_add
expr_mult
expr_pow


We get this:


expr
expr_add


I realize this is not terribly useful in the general case as regards AST
production, but it cleans up the nasty case of languages like C++ with many
different operators each with different precedence.


By the time the reduction occurs for expr, the reduction code will see only
the massaged version of the tree.


> The much loved/hated visitor pattern is another approach to the same
> piece-meal semantic definition problem. I find it quite instructive
> to see that two of the most popular visitor generator tools (JJtree
> and jtb) are integrated with the underlying grammars.


The meta-s engine also has visitation built in. When the parse is finalized,
visits occur on a production bases. If one is using the C++ version, one has
a visitor for each node-type (say, ExprVisitor). With the Lua version, one
simply does this:


grammar Foo host Lua
{
S ::= (a b ",")+;
a ::= '[a-z]+';
b ::= '[0-9]+';


@function_impls = :{
function b_visit(N, H)
print("Node type b visited. Lexeme = " .. N:GetLexeme(H));
end
}:;
};


Given the input abc123def456, the above would output:


Node type b visited. Lexeme = 123
Node type b visited. Lexeme = 456


That is to say, all the b nodes would be visited after S accepts the string
in its entirety.


Finally, the #notree directive is just that -- productions with this in them
don't produce nodes on the tree -- such as the __ws production (whitespace).


> Ok, why do I think that being able to separate the semantics into a
> separate file is not that important?


In my case -- one reason it's not so important is that many "semantic"
things can be done in the grammar specification itself -- thereby relieving
the need for so much code. I'm sure this violates the idea of separation of
syntax and semantics, but in fact, simply reduces what "used to be
considered semantics" to a matter of syntax -- not quite the same thing in
my mind.


Cheers.


--
Quinn Tyler Jackson
http://members.shaw.ca/qjackson/


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.