30 Dec 2005 22:17:17 -0500

Related articles |
---|

syntax directed translation aegis@mad.scientist.com (aegis) (2005-12-26) |

Re: syntax directed translation markwh04@yahoo.com (2005-12-30) |

Re: syntax directed translation jost@cs.dal.ca (Allan Jost) (2006-01-02) |

Re: syntax directed translation gneuner2@comcast.net (George Neuner) (2006-01-07) |

From: | markwh04@yahoo.com |

Newsgroups: | comp.compilers |

Date: | 30 Dec 2005 22:17:17 -0500 |

Organization: | http://groups.google.com |

References: | 05-12-081 |

Keywords: | parse, syntax |

aegis wrote:

*> Are syntax directed translation and syntax directed definition*

*> synonymous?*

*>*

*> [I've never heard of syntax directed definition. -John]*

A syntax directed definition is a high-level specification that defines

each item in terms of its syntatic components (e.g. in a natural

language grammar, with a phrase structure rule S -> NP VP, an instance

of S would be defined in terms of the instances of NP and VP that make

it up).

The term is farily standard in Computer Science, and more information

can be found (for instance) by doing a web search in Yahoo, Google or

elsewhere.

A simple syntax directed translation, or SSDT, (with input alphabet X,

output alphabet Y) is just a context-free grammar over the direct

product monoid X* x Y*. That's equivalent to specifying a context-free

grammar over the alphabet X union Y, but allowing commutativity (xy =

yx, x in X, y in Y). [See "Appending Remarks" below]

It is also, therefore, equivalent to that which may be processed by a

push-down transducer with the same input and output alphabets (which,

in turn, is equivalent to a push down automaton over the monoid X* x

Y*).

A syntax directed translation, or SDT, is more general in that the

phrase structure rules allow for the order of non-terminals to be

swapped around.

That's equivalent to adding a stack mechanism to a simple syntax

directed grammar (to hold instances of non-terminals that are to

participate in said swapping).

I don't know precisely what the relative powers of SDT's and SSDT's are

But, one can specify an SDT through an SSDT, provided you add extra

items to the output alphabet Y to encode the actions of a stack

machine. But the output, itself, then has to be algebraically reduced

through whatever underlying stack algebra is associated with those

extra items.

A simple syntax definition is associated with an SSDT. For every rule,

such as S -> NP VP, and for every instance -- denoted here NP(n), VP(v)

of NP and VP, the definition may be regarded as an "output action"

affixed to the rule telling you how to build the corresponding instance

S(s), of S; i.e.

S(s) -> NP(n) VP(v) { build s from n and v like

so ... }.

Then, the SSDT, T, will translate each word from X* into a word in Y*,

with Y being the language of build instructions; and the output of a

word w in X* will be a set, T[w], of words each of which consists of an

instruction sequence encoding the actual item to be built.

In that light, the relation between SDT and SSDT may be clarified. Just

take the SSDT as a syntax-directed definition for the parse tree,

itself, but add in a new start symbol S' and rule S'(output) -> S(root)

{ convert root into output }, which calls an algorithm that squashes

the parse tree back down into an output stream, by recursively

descending into its structure. The node swaps can be performed during

this process.

Appending Remarks: Some terminology & background

A monoid is an algebra with an associative product a,b |-> ab

(concatenation) and identity 1 ("empty word"). The monoid formed freely

from a set X is just X*, the monoid of all words from X -- X* is the

place where classical formal language and automata theory lives.

The direct product M x N of monoids M, N is the monoid built up from M

and N with the additional supposition of the relation mn = nm for m in

M, n in N. It may be defined formally as the set of ordered pairs with

m in M corresponding to (m,1); n in N corresponding to (1,n), products

being defined by (m,n)(m',n') = (mm',nn') so that the "product" of m

and n would correspond to mn <-> (m,1)(1,n) = (m,n) = (1,n)(m,1) <->

nm.

The free extension M[Q] of a monoid by a set Q is obtained by adding

the extra items in Q to M, subject to no additional relations. Hence,

for instance X*[Q] = (X union Q)*.

A grammar G = (Q,S,P) over a monoid M consists of a set, Q, of

non-terminals, a starting configuraiton S in M[Q] (the restriction of S

to Q, that is usually imposed, is inessential), and a subset P of M[Q]

x M[Q] comprising the rules. (The requirement that domain(P) and M be

non-intersecting is usually imposed, but is also inessential).

Associated with it is a transition relation => over M[Q] with (1) A=>A

for all A in M[Q]; and (2) A=>BED wherever A=>BCD and (C,E) in P. So,

each configuration A may be associated with the set of words it

generates, [A] = { m in M: A => m }.

All the rules in P encode 1-step configurations (C,E) in P --> C=>E. So

the above may be equivalently regarded as a system of inequalities

(sitting atop an underlying partially ordered algebra) in which the

ordering => has the additional properties (A=>B, C=>D -> AC=>BD);

(transitivity, A=>B=>C -> A=>C).

The "language" recognized by G is then L(G) = [S], a subset of M.

The grammar is context-free if domain(P) subset of Q (i.e. the LHS of

each rule is a single non-terminal); and then satisfies the properties

[m] = {m}, for m in M; and [AB] = [A][B], for A, B in M[Q]. The grammar

is then naturally interpreted as a system of relations with (q,A) in P

being interpreted as [q] superset of [A]. Its least solution (least in

the sense of subset ordering) in the "variables" in Q is then just (q =

[q]: q in Q). The least solution in the main configuraiton S is just

L(G) = [S]. This provides a concrete realization of the

grammar-as-system-of-inequalities interpretation.

Regular grammars are context-free grammars where P subset of Q x (MxQ

union M). The regular subsets of X* are the regular languages over X;

the regular subsets of X* x Y* are the finite state transductions with

inputs from X, outputs in Y; the context-free subsets of X* (and X* x

Y*) are (respectively) the context-free languages and simple syntax

directed translations over the respective input (and output) alphabets.

Similar comments apply to the turing languages over X and turing

transductions from X to Y.

Context-sensitive grammars can't readily be defined for general monoids

(though one can come up with a slightly more general notion in which

the requirement |A| <= |B| for (A,B) in P is imposes, where |A| counts

the number of Q's in A). So only 3 out of 4 levels of the Chumsky

hierarchy participates in the foregoing.

[I've been in the computer field for 20 years, and even know the guy

who invented syntax directed translation in 1962, and I still haven't

ever heard of syntax directed definition, although we wrote compilers

that did that. But if you say people use the term, I won't

argue. -John]

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.