Re: Rewriting as a means for translating

"Terence Parr" <parrt@s1.arc.umn.edu>
Tue, 8 Feb 1994 19:32:40 GMT

          From comp.compilers

Related articles
Rewriting as a means for translating hbaker@netcom.com (1994-02-04)
Re: Rewriting as a means for translating schrod@iti.informatik.th-darmstadt.de (1994-02-07)
Re: Rewriting as a means for translating parrt@s1.arc.umn.edu (Terence Parr) (1994-02-08)
| List of all articles for this month |
Newsgroups: comp.compilers
From: "Terence Parr" <parrt@s1.arc.umn.edu>
Keywords: translator, tools
Organization: Compilers Central
References: 94-02-032 94-02-043
Date: Tue, 8 Feb 1994 19:32:40 GMT



hbaker@netcom.com (Henry G. Baker) writes:
> In my experience, the idea of source-to-source translation by means of
> a simple rewriting system is always a loser. No matter how close the
> languages seem, there are always significant "gotchas".


Joachim Schrod (schrod@iti.informatik.th-darmstadt.de) writes:
> I agree with you, but how do you find the idea of Terence Parr to use
> a high-level specification of tree transformations?


I also agree with the current consensus whole-heartedly, but don't deserve
credit for tree pattern matching of intermediate trees--I have simply
applied the cool pattern matching ideas of code-generator generators to
source-to-source translation. From my experience translating large
scientific serial Fortran-77 programs to Connection Machine Fortran, it
has become obvious that having a front-end build an intermediate
representation (IR) and walking the IR *multiple* times is the best way to
go. Each stage of the IR translation can then be developed, debugged, and
used independently. Trying to effect the entire translation with a few
attribute-evaluation functions is bogus IMHO. Further, "complete"
source-to-source tools generally force you to use a particular parser
generator; anyone who subscribes to this newsgroup will agree that
programmers do not agree on which parser generator to use.


Here is most of the abstract from the SORCERER overview paper:


Despite the sophistication of code-generator generators and
source-to-source translator generators (such as attribute-grammar based
tools), programmers often choose to build tree parsers by hand for source
translation problems. In many cases, a programmer has a front-end that
constructs intermediate form trees and simply wants to traverse the trees
and execute a few actions. In this case, the optimal tree walks of
code-generator generators and the powerful attribute evaluation schemes of
source-to-source translator systems are overkill; programmers would rather
avoid the overhead and complexity.


SORCERER is more suitable for the class of translation problems lying
between those solved by code-generator generators and by full
source-to-source translator generators. SORCERER generates simple,
flexible, top-down, tree parsers that, in contrast to code-generators, may
execute actions at any point during a tree walk. SORCERER accepts
extended BNF notation, allows predicates to direct the tree walk with
semantic and syntactic context information, and does not rely on any
particular intermediate form, parser generator, or other pre-existing
application.


> It's described in his paper ``An Overview of SORCERER: A Simple
> Tree-Parser Generator.''
> [...]
> At least, I'm still waiting for the release of the SORCERER source...


I have redesigned the parsers generated by SORCERER (previously I
generated a grammar which was then run through ANTLR to get the tree
parser). The new output is really simple and directly encodes a tree
walker like a recursive-descent parser; predicated-LL(1) works well for IR
walking because people design their IRs deliberately to be easy to parse.
I spent last week writing a 25 page document describing the total redesign
of the tree parsers generated by SORCERER. It also describes much of the
guts of SORCERER. From this design document, coding it up was easy. I
have most of it done and hope to have it to folks like Gary Funck at
Intrepid in CA for beta testing by the end of the week. SO, hopefully
sometime next week SORCERER will released.


Oh, by the way, part of the reason for the delay was that I had to
redefine the normal lookahead operations for use in the tree matching
environment; weird problems arise. For example, consider the simple
grammar fragment:


a : #(A b) B
    ;


b : E
    | /* epsilon-transfer */
    ;


where #(root sibling_1 ... sibling_n) is a tree pattern; therefore, rule
'a' matches trees that looks like:


                A--B and A--B
                |
                E


The lookahead set (in the LL(1) sense) of the first production of 'a' is
trivially {A}. The lookahead set of the first production of 'b' is
trivially {E}. HOWEVER, what lookahead predicts the second production of
'b'? Is it {B} (the FOLLOW(b))? Nope! It's {$} where '$' means
end-of-input. Notice that nothing can truly follow references to rule
'b'; 'B' must be matched at a different tree level. When rule 'a' looks
for a nonexistent child by calling rule 'b', the second alternative
production of 'b' will match it by ensuring that the tree pointer is NULL.
Here is the output of SORCERER:


void a(AST **root)
{
                AST *_t = *root;
                if ( (_t->token==A) ) {
                                {_SAVE;
                                _MATCH(A); _DOWN;
                                  b(_t);
                                _RESTORE;}
                                _RIGHT;
                                _MATCH(B);
                                _RIGHT;
                }
                else {
                                no_viable_alt("a", _t);
                }
                *root = _t;
}


void b(AST **root)
{
                AST *_t = *root;
                if ( (_t->token==E) ) {
                                _MATCH(E);
                                _RIGHT;
                }
                else {
                                if ( (_t==NULL) ) {
                                }
                }
                else {
                                no_viable_alt("b", _t);
                }
                *root = _t;
}


Note that switch-statements are not used in this case because
'_t==NULL' cannot be turned into a case of a "switch (_t->token) {...}".


Regards, Terence Parr U of MN, AHPCRC
"Why program by hand in 5 days what you can spend 5 years of your
life automating?"
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.