Re: source-to-source translators, what language for unparsers

"Ira Baxter" <idbaxter@semdesigns.com>
Wed, 6 May 2009 19:57:32 -0500

          From comp.compilers

Related articles
source-to-source translators, what language for unparsers cfc@shell01.TheWorld.com (Chris F Clark) (2009-05-05)
Re: source-to-source translators, what language for unparsers sh006d3592@blueyonder.co.uk (Stephen Horne) (2009-05-06)
Re: source-to-source translators, what language for unparsers rtrnews@googlemail.com (2009-05-06)
Re: source-to-source translators, what language for unparsers idbaxter@semdesigns.com (Ira Baxter) (2009-05-06)
Re: source-to-source translators, what language for unparsers Ruslan@Shevchenko.Kiev.UA (rssh) (2009-05-08)
Re: source-to-source translators, what language for unparsers bear@sonic.net (Ray Dillinger) (2009-05-09)
Re: source-to-source translators, what language for unparsers gopi.onthemove@gmail.com (2009-06-03)
| List of all articles for this month |

From: "Ira Baxter" <idbaxter@semdesigns.com>
Newsgroups: comp.compilers
Date: Wed, 6 May 2009 19:57:32 -0500
Organization: Compilers Central
References: 09-05-026
Keywords: translator
Posted-Date: 07 May 2009 07:34:27 EDT

"Chris F Clark" <cfc@shell01.TheWorld.com> wrote in message
>I was recently working on the output side of Yacc++ and realized I
> want something better than what I have. In paricular, I want
> something where I can quickly take an example output (for some example
> input), add some fairly minimal markup and have it turn that into the
> appropriate sequence of output commands that recreate that output
> given that input. Repeat as needed with different sets until the
> program covers all the cases.
>
....
> 5) I believe there are also systems that unparse the AST, with
> annotations for what to generate embedded in the AST. That
> probably would work well with a visitor paradigm where one could
> define an order to walk the AST.


So the assumption is you build the AST to represent the code you want
to generate. That matches our model.


What we use in the DMS Software Reengineering Toolkit is a "box"
language, in which AST "nodes" (in our case, grammar rules) are
annotated with box building notations to enable horizontal H(a,b,c),
vertical (V(a,b,), and indented I(x) boxes of text be constructed from
boxes of text synthesized by children nodes. (In fact, this is
implemented as an attribute grammar). Terminal nodes generate small
boxes of text that are one line high and n characters wide
corresponding to the terminal value as string, either as a literal
(see 'DO' below) or as the terminal value converted back to text (see
"name" below).


Here's an example from our Fortran 90 front end (which combine grammar
rules and prettyprinting into a single document). The -- Rnnn
notations are references to the original F90 specification; we follow
standards very closely where possible.


----------- Example grammar/prettyprinting rules ------


label_do_stmt = optional_loop_name 'DO' NATURAL_LITERAL
optional_loop_control STMT_END ; -- R819
      <<PrettyPrinter>>:
            { V(optional_loop_name,
                    H('DO',NATURAL_LITERAL,optional_loop_control,STMT_END)); }
optional_loop_name = ; -- R819
optional_loop_name = name ':' ; -- R819
      <<PrettyPrinter>>: { H*; }
optional_loop_control = ; -- R819
optional_loop_control = loop_control ; -- R819


outer_shared_do_construct =
      label_do_stmt
      do_body
      shared_term_do_construct ; -- R830
      VerifyOuterSharedDoConstructLabel
      <<PrettyPrinter>>:
            { V(H(label_do_stmt),
                        I(do_body),
                        shared_term_do_construct); }


shared_term_do_construct = outer_shared_do_construct ; -- R831
shared_term_do_construct = inner_shared_do_construct ; -- R831


inner_shared_do_construct =
      label_do_stmt
      do_body
      NATURAL_LITERAL do_terminated_shared_stmt ; -- R832
      VerifyInnerSharedDoConstructLabel
      <<PrettyPrinter>>:
            { V(H(label_do_stmt),
                        I(do_body),
                        H(NATURAL_LITERAL,do_terminated_shared_stmt)); }


do_terminated_shared_stmt = continue_stmt ; -- R832
do_terminated_shared_stmt = do_terminated_action_stmt ; -- R832


----- End example ------


This generally produces quite nice output. We allow conditional box
expressions IF(cond,box1,box2) to allow choice between two constructed
box expressions. To do this right, the attribute evaluator does
top-down evaluation so that it doesn't construct both sets of boxes,
either of which might be quite large.


We in fact build source code prettyprinters this way. Simply parse to
AST using the grammar rules, then unparse using the prettyprinting
rules.


[Note that we don't bother with tree building rules. Our trees are
isomorphic to the grammars, which makes building a language parser a
lot easier when you have several thousand rules (COBOL). Who wants to
invent all those node types? To keep trees sizes small, we do
surpress non-value-carrying terminals and useless unary productions,
and we do automatically manufacture lists. The point is the grammar
engineer doesn't have to think about any of this. And the resulting
trees are remarkably like ASTs that are hand-coded.]


What this doesn't do nicely is produce pretty output where two
constructs interact. For this case, we are considering using source
code patterns, which DMS captures as trees with positioning
information. Example (chosen to be odd to make the point):


        pattern
nest-elsedo-deeper-than-then(c:expression,s1:statements,s2:statements):statement
            " IF (\c)
                    THEN
                            \s1
                    ELSE DO \s2
                                        END".


This pattern is captured by DMS with the metavariable s1 and s2 having
starting column numbers that don't match. Then one could generate
output text according to the pattern and relative column offsets of
the elements. The value in this is that one can have patterns that
are specific to particular code idioms to get desired layout for those
idioms. You'd still want to back this up with the box langauge to
handle the general case, e.g., where no idiom applied.


The ASF-SDF (and Stratgo, I think) program transformation engines
carry out the box language idea to the extreme. Thier model is that
one applies program transformation rules that map your
programming-language AST into literally an AST composed of box
operators. You can then apply box-transformations to that result to
reshape it further, presumably to achieve something like the
pattern-based rule above. I'm not sure I'm comfortable with that
idea, as the box langauge doesn't understand the programming langauge,
and it isn't clear you could make rules that change layout depending
on the DO construct and have it preserved.




--
Ira Baxter, CTO
www.semanticdesigns.com



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.