|WANTED - info. on cross-progrmming-language translation email@example.com (1993-11-16)|
|Program translation issues synaptx!thymus!daveg@uunet.UU.NET (Dave Gillespie) (1993-11-18)|
|From:||Dave Gillespie <synaptx!thymus!daveg@uunet.UU.NET>|
|Date:||Thu, 18 Nov 1993 01:36:32 GMT|
Gary Funck writes:
> [...] P2c is an impressive piece of work,
> but its
> structure does not lend itself well to our application; we need something
> that lets us make several passes over the code, and to make various
> parse-tree transformations before generating the translated code [...]
P2c has grown pretty ad-hoc inside as it has expanded to deal with more
crazy dialects of Pascal and C. You can find examples of quite a few
techniques in various places, and in particular there are some rule-based
P2c's general strategy is to read and parse the Pascal source into a tree
form, performing various rewrites along the way; then it makes another
pass of rewrites that must be done after a whole procedure has been
parsed; and finally it generates code directly from the trees. Pascal and
C are similar enough that I can get away with a single unified (read:
kludgey) tree structure.
A simple example of the rewrites that p2c includes is the basic arithmetic
optimizations, e.g., converting "i - 1 + 1" to "i". (Such expressions
tend to arise in index computations, since Pascal arrays often start at 1
but C arrays start at 0.)
Some of the more ambitious rewrites come in the handling of strings.
String expressions like "s := s1 + '/' + sub(s2,1,10)" are generally
converted to nested calls to "sprintf", which serves as an all-purpose
string operator. Each sprintf gets its own temporary variable as a
destination. Then, various rules implementing a "sprintf algebra" reduce
the nested calls into a single sprintf. Often even the last temporary can
be eliminated. The above statement follows a path roughly like this:
strcpy(s, sprintf(T1, "%s%s%s", s1, "/",
sprintf(T2, "%.*s", 10, &s2[1-1])));
strcpy(s, sprintf(T1, "%s/%.*s", s1, 10, s2));
strcpy(s, sprintf(T1, "%s/%.10s", s1, s2));
sprintf(s, "%s/%.10s", s1, s2);
If the string expression is going to be an argument to a "printf", it can
be folded straight into the "printf" control string and again the
temporary can be eliminated.
Since some versions of "sprintf" don't return their first argument, p2c
can rewrite "sprintf(T,...)" to "sprintf(T,...),T" (using C's comma
operator). Later rewrites can usually remove the ugly comma operators by
pulling the expression apart into several statements.
It would be nice to do the rewrites using a clean table-driven
pattern-matching engine, but, in keeping with the "ad-hoc" theme, it's all
just done with crufty C code. If I wrote p2c over again, knowing how
ambitious it was finally going to be, I would probably put together a
table-driven rewrite system.
In the current program, the rewrites live in "fixblock()" and "fixexpr()",
plus throughout the various "makeexpr_..." functions in expr.c.
P2c 1.21, now in alpha testing, extends the rewrite phase to include some
basic use/def-style flow analysis and constant propagation; this allows
certain idioms (file operations, mostly), to translate more cleanly. I'll
post an announcement in this group when 1.21 is ready.
While we're on the topic of p2c, Kenn Heinrich was just talking about his
experience with whitespace in a program translator.
In p2c, I eventually decided to deal with horizontal whitespace by
ignoring it in the input and regenerating it based on a large set of
user-configurable options. Users can control indentation, spacing around
operators, and so on in great detail. Since most horizontal whitespace is
regular and stylized, this approach works pretty well.
The spacing routines also take care of breaking statements across multiple
lines; since the lengths of things vary during translation, I re-break the
lines from scratch using an optimization algorithm rather like TeX's
paragraph formatter. It assigns a numerical "badness" to various aspects
of a potential choice of line breaks and indentation, then does a mostly
brute-force search of the possible choices. It works well enough that
people have suggested making p2c's back end into a C beautifier, though no
one's done it.
For comments and vertical whitespace (blank lines, which I treat as
special comments), the strategy is to attach serial numbers to statements,
declarations, and other spots in the parse tree, and to load comments into
a separate list with serial numbers relating them back to the tree. That
way, code can be moved around without worrying about comments, and the
parser and rewrite stuff don't have to dodge around comment "tokens" as
Kenn's original program did. Even if a node in the original tree goes
away during translation, its attached comments just drift to the nearest
surviving serial number during output. It works pretty well, though in
practice some of the parser and output routines need to mess with comments
anyway to deal with odd cases.
Return to the
Search the comp.compilers archives again.