Re: Help and ideas with C-like to C transformation tool (long)

"Ira D. Baxter" <>
29 Dec 2001 13:24:42 -0500

          From comp.compilers

Related articles
Help and ideas with C-like to C transformation tool (long) (2001-12-27)
Re: Help and ideas with C-like to C transformation tool (long) (2001-12-29)
Re: Help and ideas with C-like to C transformation tool (long) (Roberto Waltman) (2001-12-29)
Re: Help and ideas with C-like to C transformation tool (long) (Ira D. Baxter) (2001-12-29)
Re: Help and ideas with C-like to C transformation tool (long) (2001-12-29)
| List of all articles for this month |

From: "Ira D. Baxter" <>
Newsgroups: comp.compilers
Date: 29 Dec 2001 13:24:42 -0500
Organization: Compilers Central
References: 01-12-162
Keywords: translator, tools
Posted-Date: 29 Dec 2001 13:24:42 EST

I think our DMS Software Reengineering Toolkit is designed to carry
out just exactly the kind of translation you wish to do. See DMS could
completely implement your current translation facility. (That wouldn't
buy you anything, but it demonstrates the power of the tool).

DMS is grammar driven. Since you have a well-defined lexer/grammar
pair already, it should be quite straightforward to modify them for
use by DMS. DMS's context free parser can surely handly your LALR(1)
yacc grammer. And once you've defined the grammar to DMS, DMS will
automatically build ASTs for you. DMS also works with multiple
languages at the same time, and a well-tested C grammar is available,
so it can work with both your legacy langauge definition and "C" at
the same time.

You say you still need to do name/type resolution. DMS provides
attribute grammar evaluation tools, coupled with a library for
managing symbol tables, to enable virtually any scoping/typing system
to "easily" accommodated.

Finally, you want to be able to code specific translations. DMS
enables you to code translation rules in the surface syntax of the
source and target languages. This enables you to easily code such
ttanslation rules, and review them later or with others with
considerable ease compared to encoding these procedurally. These
translation rules are automatically converted into a tree-rewriting
system, and applied to any designed parse trees.

Your assignment example "an(1)=an(0))" would be coded roughly as:

      source domain Clike.
      target domain C.

translate_array_source(a:NAME,index:Clike_expression):Clike_expression ->
                = " \a ( \index ) " -> "fetchnumber( \translate_NAME\(\a) ,
\index )"
                    if datatype_is_numeric(\a).

:Clike_expression -> C_expression
                = " \a ( \index ) = \e " -> "storenumber( \translate_NAME\(\a) ,
\index, \e )"
                    if datatype_is_numeric(\a).

The "source" and "target" phrases tell the pattern language which
parsers to (ab)use to parse the left and right hand sides of the
patterns (separated by "->"). The "rule" phrase gives a rule a name,
and parameterizes the rule by subtrees. The first (a->b) pair
describes the syntax context in which the rule should be applied, and
what the resulting syntax context will be. The second (a->b) pair
after the equal sign defines the source syntax and translated syntax
in the domain-lanaguage terms (left hand side expressed in CLike;
righthand in C). Parameters are escaped by the "\" character.

The "if ..." phrase makes the rewrites conditional, so you can
translation rules with the same left-hand-side-syntax, but
descriminated on type-information.

Your while rule:

ment -> C_statement
                = "WHILE \condition DO \body" -> "while ( \condition ) { \body }".

These rules are applied everywhere automatically and repeatedly unless
you tell DMS to constrain their application scope.

Now, DMS insists that a "Clike" structure is NOT a "C" structure. So
you actually have to translate "everything", although you can decide
to write trivial translations where the Clike source and C grammars
syntax happen to agree. This accounts for the odd-looking (but
trivial) function "\translate_NAME" function that converts the leaf
tree containing a CLike-identifier to the leaf tree containing a
C-identifier. Often such functions have to handle the silly change in
lexical rules (i.e., map the legal "%" character allowed in C_like
identifers to "_" in C).

What this means for DMS is that it is not possible to define the
translation without doing it for the entire language syntax.
(Actually, the DMS prettyprinter prints a "domain switch" marker if it
transitions across a child arc from a Clike-parent node to a C-child
node; if one supresses this, then the C-like code that looks like C
would be intermixed with C code, giving you C output. But in essence
what you've done in this case is to say that all untranslated C-like
constructs *are* C constructs, by cheating).

Having said all this, what is "easy" is relative to your alternatives.
If you don't have a tool like DMS, building a robust translator like
this is a bear. Using DMS to accomplish a translation like this is
"easy" (i.e, a DMS expert can build a full-up translator for a
language like JOVIAL to C (essentially identical in concepts,
different in every microscopic detail) in about 8 man-months). In
your case, you can also consider altering your present translator, at
a cost I can't predict.

I'd expect your task to be somewhat shorter, but YMMV depending on
skills and actual ambition (and like all tools, there is a training
curve). For instance, you are suggesting that you want to be able to
write not only standard constructs in your legacy language, but any
and all legal C constructs in your legacy language, too. For DMS at
least, this means adding all the C constructs you want to your legacy
langauge grammar for starters. You may find that there are some funny
type interactions you didn't expect (which is why language design is a
bitch), induced by conflicts between Clike goals and C goals. You'll
have to resolve them using translation rules, and half the battle here
is deciding what they mean.

Ira D. Baxter, Ph.D. CTO Semantic Designs

"Corey Stup" <> wrote in message
> We have a legacy language, which is very C-like, but with some added
> features such as type'd defines, library function overloading and some
> other features. [ inhouse tool translates it to C...]
> ....I don't want to verify all constructs
> (letting the C compiler do that work after its passed off..), but I
> still need its macro, shorthand, and some of its symbol table
> functions.
> Using [type information] a function call would generate a different C
> runtime-function call to be output to the C code.
> ....So, our tool will still need to do this type of analysis.
> What I *DON'T* want it to need to understand is valid "" or
> "if..then..else" syntax. I'd just like any and all constructs to just
> be passed through and have the C compiler do its work to validate the
> code.
> For another example, ...simple assignments
> ... an(1)=an(0);
> Which means, set numeric array index 1 with the numeric value of array
> index 0, gets transformed into the C code as:
> storenumber(a,1,fetchnumber(a,0));
> A while-do construct in our language would be written as:
> WHILE an(standard_define)=TRUE DO
> /* */
> This gets translated into:
> while (fetchnumber(a,0)==TRUE) {
> /**/
> }
> I guess what I'm getting at is this:
> Is it possible to define a parser that can find expressions such as
> "an(0)" or "an(standard_define)" and replace those, no matter where
> they are in a stream, without understanding the entire language
> syntax?
> If anyone has any suggestions on a toolset to perform this type of
> task, please email or post followups to this post. I can provide more
> examples if anyone is interested.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.