Re: syntax directed translation

markwh04@yahoo.com
30 Dec 2005 22:17:17 -0500

          From comp.compilers

Related articles
syntax directed translation aegis@mad.scientist.com (aegis) (2005-12-26)
Re: syntax directed translation markwh04@yahoo.com (2005-12-30)
Re: syntax directed translation jost@cs.dal.ca (Allan Jost) (2006-01-02)
Re: syntax directed translation gneuner2@comcast.net (George Neuner) (2006-01-07)
| List of all articles for this month |

From: markwh04@yahoo.com
Newsgroups: comp.compilers
Date: 30 Dec 2005 22:17:17 -0500
Organization: http://groups.google.com
References: 05-12-081
Keywords: parse, syntax
Posted-Date: 30 Dec 2005 22:17:17 EST

aegis wrote:
> Are syntax directed translation and syntax directed definition
> synonymous?
>
> [I've never heard of syntax directed definition. -John]


A syntax directed definition is a high-level specification that defines
each item in terms of its syntatic components (e.g. in a natural
language grammar, with a phrase structure rule S -> NP VP, an instance
of S would be defined in terms of the instances of NP and VP that make
it up).


The term is farily standard in Computer Science, and more information
can be found (for instance) by doing a web search in Yahoo, Google or
elsewhere.


A simple syntax directed translation, or SSDT, (with input alphabet X,
output alphabet Y) is just a context-free grammar over the direct
product monoid X* x Y*. That's equivalent to specifying a context-free
grammar over the alphabet X union Y, but allowing commutativity (xy =
yx, x in X, y in Y). [See "Appending Remarks" below]


It is also, therefore, equivalent to that which may be processed by a
push-down transducer with the same input and output alphabets (which,
in turn, is equivalent to a push down automaton over the monoid X* x
Y*).


A syntax directed translation, or SDT, is more general in that the
phrase structure rules allow for the order of non-terminals to be
swapped around.


That's equivalent to adding a stack mechanism to a simple syntax
directed grammar (to hold instances of non-terminals that are to
participate in said swapping).


I don't know precisely what the relative powers of SDT's and SSDT's are


But, one can specify an SDT through an SSDT, provided you add extra
items to the output alphabet Y to encode the actions of a stack
machine. But the output, itself, then has to be algebraically reduced
through whatever underlying stack algebra is associated with those
extra items.


A simple syntax definition is associated with an SSDT. For every rule,
such as S -> NP VP, and for every instance -- denoted here NP(n), VP(v)
of NP and VP, the definition may be regarded as an "output action"
affixed to the rule telling you how to build the corresponding instance
S(s), of S; i.e.
                                                S(s) -> NP(n) VP(v) { build s from n and v like
so ... }.


Then, the SSDT, T, will translate each word from X* into a word in Y*,
with Y being the language of build instructions; and the output of a
word w in X* will be a set, T[w], of words each of which consists of an
instruction sequence encoding the actual item to be built.


In that light, the relation between SDT and SSDT may be clarified. Just
take the SSDT as a syntax-directed definition for the parse tree,
itself, but add in a new start symbol S' and rule S'(output) -> S(root)
{ convert root into output }, which calls an algorithm that squashes
the parse tree back down into an output stream, by recursively
descending into its structure. The node swaps can be performed during
this process.


Appending Remarks: Some terminology & background
A monoid is an algebra with an associative product a,b |-> ab
(concatenation) and identity 1 ("empty word"). The monoid formed freely
from a set X is just X*, the monoid of all words from X -- X* is the
place where classical formal language and automata theory lives.


The direct product M x N of monoids M, N is the monoid built up from M
and N with the additional supposition of the relation mn = nm for m in
M, n in N. It may be defined formally as the set of ordered pairs with
m in M corresponding to (m,1); n in N corresponding to (1,n), products
being defined by (m,n)(m',n') = (mm',nn') so that the "product" of m
and n would correspond to mn <-> (m,1)(1,n) = (m,n) = (1,n)(m,1) <->
nm.


The free extension M[Q] of a monoid by a set Q is obtained by adding
the extra items in Q to M, subject to no additional relations. Hence,
for instance X*[Q] = (X union Q)*.


A grammar G = (Q,S,P) over a monoid M consists of a set, Q, of
non-terminals, a starting configuraiton S in M[Q] (the restriction of S
to Q, that is usually imposed, is inessential), and a subset P of M[Q]
x M[Q] comprising the rules. (The requirement that domain(P) and M be
non-intersecting is usually imposed, but is also inessential).


Associated with it is a transition relation => over M[Q] with (1) A=>A
for all A in M[Q]; and (2) A=>BED wherever A=>BCD and (C,E) in P. So,
each configuration A may be associated with the set of words it
generates, [A] = { m in M: A => m }.


All the rules in P encode 1-step configurations (C,E) in P --> C=>E. So
the above may be equivalently regarded as a system of inequalities
(sitting atop an underlying partially ordered algebra) in which the
ordering => has the additional properties (A=>B, C=>D -> AC=>BD);
(transitivity, A=>B=>C -> A=>C).


The "language" recognized by G is then L(G) = [S], a subset of M.


The grammar is context-free if domain(P) subset of Q (i.e. the LHS of
each rule is a single non-terminal); and then satisfies the properties
[m] = {m}, for m in M; and [AB] = [A][B], for A, B in M[Q]. The grammar
is then naturally interpreted as a system of relations with (q,A) in P
being interpreted as [q] superset of [A]. Its least solution (least in
the sense of subset ordering) in the "variables" in Q is then just (q =
[q]: q in Q). The least solution in the main configuraiton S is just
L(G) = [S]. This provides a concrete realization of the
grammar-as-system-of-inequalities interpretation.


Regular grammars are context-free grammars where P subset of Q x (MxQ
union M). The regular subsets of X* are the regular languages over X;
the regular subsets of X* x Y* are the finite state transductions with
inputs from X, outputs in Y; the context-free subsets of X* (and X* x
Y*) are (respectively) the context-free languages and simple syntax
directed translations over the respective input (and output) alphabets.
Similar comments apply to the turing languages over X and turing
transductions from X to Y.


Context-sensitive grammars can't readily be defined for general monoids
(though one can come up with a slightly more general notion in which
the requirement |A| <= |B| for (A,B) in P is imposes, where |A| counts
the number of Q's in A). So only 3 out of 4 levels of the Chumsky
hierarchy participates in the foregoing.


[I've been in the computer field for 20 years, and even know the guy
who invented syntax directed translation in 1962, and I still haven't
ever heard of syntax directed definition, although we wrote compilers
that did that. But if you say people use the term, I won't
argue. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.