Thu, 3 Feb 1994 00:34:58 GMT

Related articles |
---|

expression rewrites - how to? intrepid!gary@uunet.uu.net (1994-01-31) |

Re: expression rewrites - how to? synaptx!thymus!daveg@uunet.UU.NET (Dave Gillespie) (1994-02-03) |

Newsgroups: | comp.compilers |

From: | Dave Gillespie <synaptx!thymus!daveg@uunet.UU.NET> |

Keywords: | translator, parse |

Organization: | Compilers Central |

References: | 94-02-011 |

Date: | Thu, 3 Feb 1994 00:34:58 GMT |

Gary Funck writes:

*> The source language (Pascal) and the target language (Ada) have*

*> differing operator priorities (and associativity). The translator*

*> must rewrite expressions from the source language into those*

*> of the target language, while preserving the order of evaluation*

*> expressed in the source language.*

Our moderator outlined the basic answer in his postscript to your

message, but I thought I'd contribute some more background and a

few suggestions.

*> An additional constraint that*

*> is desirable is to keep the parenthesization present in the original*

*> source text, if this parenthesization is adequate to preserve the*

*> order of evaluation when translated into the target language.*

*> Additionally, the expression should be rewritten with minimal*

*> parenthesization.*

These two sentences are at odds with each other, so you'll have to figure

out what exactly you mean by them.

Basically, a good translator has to worry about the following four steps:

1. Add parentheses when Ada requires them but Pascal doedn't.

2. Remove parentheses which Pascal requires but Ada doesn't.

3. Add optional parentheses to enhance clarity.

4. Preserve unnecessary parentheses which were there for clarity.

Only step 1 is necessary for a correct translation. In my Pascal to C

translator "p2c" I found steps 2 and 3 to be a very good idea; I don't do

step 4 in p2c but I probably should. (I've had one or two comments about

this from p2c users in the past few years, but it's interesting that none

of them considered step 4 to be vital.)

I also faced this problem with Calc, a calculator/algebra system for GNU

Emacs that supports several "language modes" for algebraic formulas. It

can both parse and display in C, Pascal, FORTRAN, TeX, and a few other

languages. The "translation" happens implicitly when you enter a formula

in one language mode and then switch to another language mode for display.

I found the operator precedence parsing algorithm to work well in both

programs. Calc uses an explicit table-driven recursive operator

precedence parser. The table keeps a "left" precedence and a "right"

precedence for each operator. Calc uses the same table for both parsing

and formatting expressions.

The parsing algorithm looks like this; you ought to find something like it

in any compiler book:

parse_expr(prec)

{

parse a "factor", e.g., variable name or literal;

while (next-operator-left-precedence >= prec) {

skip and save operator token;

call parse_expr(the-operator-right-precedence);

}

*}*

For left associative operators, the right-precedence is one greater than

the left-precedence; for right associating operators, the left-precedence

is one greater.

The display algorithm in Calc looks like this:

format_expr(expr, prec)

{

if (expr is a factor)

return formatted-factor;

else if (prec > min(expr-left-prec, expr-right-prec))

return "(" + format_expr(expr, 0) + ")";

else

return format_expr(left-subexpr, expr-left-prec)

+ operator

+ format_expr(right-subexpr, expr-right-prec);

*}*

P2c uses an ad-hoc algorithm for parsing, but its formatting algorithm is

essentially like Calc's.

The above routines will give you steps 1 and 2 of my outline but not 3 or

4. I've found it helps quite a bit to add some unnecessary parentheses as

described in step 3. For example, p2c adds the unnecessary parens in "(a

< b) == (c < d)" by default, and optionally also parenthesizes "(a * b) +

(c * d)" and "(a && b) || (c && d)". P2c uses ad-hoc heuristics to decide

when to add unnecessary parens; I worked out the heuristics just by trying

lots of things and seeing what looked "nice," although most of them would

be familiar to any experienced C programmer.

In this sense, you're actually pretty lucky going from Pascal to Ada,

since the basic precedence layouts are similar even if they differ in

details.

For step 4, you want to include explicit parentheses as a kind of

pseudo-operator. Be careful not to generate two sets of parentheses by

accident. Also, there is the tricky question of what to do about

parentheses that *were* necessary in Pascal but are not in Ada. Consider

the expression "(a = b) and (c = d)", in which the parens are required.

Translating to Ada, "a = b and c = d" is sufficient. I would be inclined

to remove the parens in this case and then allow step 3 to add them back

in if the extra-parens option is turned on. But you might be able to kill

two birds with one stone by preserving all parens, even if they were used

"under duress" in the Pascal. This strategy might preserve too many

undesirable parentheses, but I'll bet Pascal and Ada are similar enough

for you to get away with it.

Another case to watch out for, taking Pascal-to-C as an example, is

"if (a = b) then ...", which contains unnecessary parentheses that should

*not* lead to "if ((a == b)) ...". My C code generator emits the parens

around an "if" condition separately, not as part of the expression

generator. If I had p2c preserve parens, I'd have to do something special

to avoid this pitfall. I don't remember my Ada syntax well enough to say

whether or not this kind of thing crops up with Ada.

If other parts of your translator actually examine and modify expression

trees, watch out for funny interactions with explicit parentheses. For

example, if the array A is 1-based in Pascal but 0-based in C, p2c

translates "A[i+1]" to "A[i]", not "A[i+1-1]". If for some reason the

Pascal code said "A[(i+1)]", you'd have to ask whether it's better to

produce "A[(i+1)-1]" or "A[i]"; you probably *don't* want to produce

"A[(i)]". This example is kind of silly since people don't write [( ...

)], but there are plenty of other situations in p2c where this kind of

thing could plausibly happen. Again, Ada is much more Pascal-like than C,

so maybe this will never come up for you.

-- Dave

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.