Re: Pascal to C advice wanted

rkrayhawk@aol.com (RKRayhawk)
27 Feb 2000 02:44:42 -0500

          From comp.compilers

Related articles
Pascal to C advice wanted joseph@cse.iitk.ac.in (Joseph George) (2000-02-21)
Re: Pascal to C advice wanted rkrayhawk@aol.com (2000-02-27)
Re: Pascal to C advice wanted alimarine@zonnet.nl (Artem Alimarine) (2000-04-01)
| List of all articles for this month |
From: rkrayhawk@aol.com (RKRayhawk)
Newsgroups: comp.compilers
Date: 27 Feb 2000 02:44:42 -0500
Organization: AOL http://www.aol.com
References: 00-02-107
Keywords: C, Pascal, translator

Joseph George,joseph@cse.iitk.ac.in Date: 21 Feb 2000 23:58:36 -0500
wrote
<<
Am I supposed to make a syntax tree while parsing and when i'm
finished, pass the pointer to the tree to the unparser?
>>


I will try to answer this in a basic way, and hope that is at the
right level for the original poster. At the end I have a slight
exaggeration that is used to make a point.


It is possible in some cases to actually parse an entire input
sequence, building an ever growing AST, and only when done with the
whole thing, pass it somehow (pointer or even a file ID), to the next
phase which in your case is the unparser. Alternatively, you can pass
a piece at a time for digestion.


This raises some simple issues. You have to decide the kinds of
things which represent units of work that are gatherable as a set of
nodes for one (partial) engagement of the unparser. Typically
analyzing this leads to several different kinds of units of work, and
so several kinds of engagements with the next phase (the unparser in
your case).


Thus as a literal example answer to your over all question, you could
have a rule that recognizes the declaration of integers which invokes
some function in the parser with pointer to the portion of the AST
that represents that declaration and the collection of vars that hang
on it. Likewise you could have a rule that recognizes the declaration
of reals and call the same function in the parser but either pass a
slightly different control parm to indicate REAL (instead of INTEGER),
or let the encodings in the AST node accomplish the same driving
effect. This function is just one of perhaps many in the 'other
functions' section of your parser - functions written in C, as is the
code generated automatically by your parser.


You can thus begin to see that your parser rules fall into atleast two
categories: first those that primarily build parts of the AST, and
those at some syncpoint that recognize a unit of work has been
completely recognized and (after perhaps a little more AST decoration)
primarily engage the emiters.




Any strategy to break up the interaction between your parser and the
next phase (the unparser) must keep in mind the recursive nature of
the input syntax environment of your source language (in this case
Pascal). So you can build the top part of a tree for a program (or
even a unit), not have it complete yet ... then build an attached
subtree for the const or var declarations encountered, passing them
group by group to appropriate functions for translation (unparsing),
passing only the pointer to the upper most node of these declarations,
not the top node for the outer context (that is the declaration
handlers need not know that they are dealing with a child of an outer
program or unit, nor if it is a program within a program).


One curve in this is that your translator (unparser) must be given a
clue as to whether the declarations are global or local: that is a
lexical aspect that again can be enscribed in the AST or in the
interface from the invocation in your actions to the functions that
handle the declarations.


When you lex there is a sequence of events, namely the encounters with
Pascal tokens. As the parse rules reduce there is a second sequence of
events, namely the recognition of the bundles of tokens. Your later
stage (the unparser) manages a third sequence of events, the emitting
of the C code.


The first two sequences interweave. The third sequence can either
happen all at once only after the other two sequences are completely
over; or you can interweave the third with the dance that is going on
between the first two, but you are completely at liberty to determine
the exact timing of the interweave.




As a very generalized proposition the interfaces to the later stage
can normally only happen as often a parser rules is reduced. It can
happen less often, and usually does happen less often - the special
case is waiting until reduction of the 'start' rule (which in LR tools
like yacc should be called the 'finish' rule).


The AST allows you, in effect, to spool rule reductions. When you
recognize that the parser can just as easily emit a file, you can
easily see the spooling aspect, but a file (unless it uses database
technology) is hard pressed to represent the structural aspects you
will need, particularly the recursive features of the language(s).


But if you look at the AST as simply a spool it is less intimidating,
perhaps. Unspool whenever you think it makes sense; which is about as
soon as you have meaningfully complete small parts.


Just because you use an AST, even heavily, does not mean that you are
forbidden to use other data structures in your parser program. You can
use stack like data strutures to push and pop current compilation
context. Access to the top of a stack-like context manager data
structure could be the basis of interaction between your actions and
any invoked emitter routines, in combination with the AST.


Your real problem is in the planning stage, how do you get the various
sequences of events to interweave correctly. That will require a very
clear understanding of anything that could be delayed in recognition.


That is, if you wait until you have put _everything_ into the AST
(spooled it as I would say), then you know everything, but you will
have a massive amount of walking to do. If you will be able to walk
portions of the resulting full AST, emitting output at various points,
then why not emit as soon as those subtrees are built?


The answer would be you would be motivated to wait if there is
anything else that might be encountered that would change the meaning
(here semantics is equivalent to mapping to the correct C code) of the
AST built up to some point.


Because Pascal remains true to its pedagogical roots, you do have
syncpoints where you can emit safely. The moderators advice to study
p2c is valuable both as a source of a model for the timing of emits,
as well as a way to research where others may have found some
exceptions to any assumption of an easy map from Pascal to C - behind
their coding strategy there will be a record of what they consider a
valid sequence interweave.


There is one other point that may interest you if you are relatively
new to this compiler-compiler stuff. And my post kind of assumes that,
so don't take offense if I have been too simplistic, ... just trying
to help. It is important to note, however, that you can under some
circumstance engage the 'later' stages (the unparser in your
translator) from some points in the lexer. This constitutes the basis
of interacting sequences that can actually occur in a way that makes
emits even faster then the rate at which groups of tokens are reduced
as rules in the parser. Motivation to do that is certainly rare. But
if the above description of the issue as a set of interweving
sequences is useful to you, then it is perhaps useful to see this
later extension. Most likely a call from the lexer to any component in
the later staged (that is stages past the parser) would not so much
trigger emits, as set control modes through various stages.


The tools you are using are deployed in C. So all of the functions of
later stages are global (because all functions are global), so if you
actually needed them, you can get to them from the very front end! I
am not trying to talk you into anything like that. But once you see
that, then you realize that it is, of course, possible for any rule in
_the_parser_ to call anything in the back-end: a piece at a time or
the whole ball of wax. If the lexer could get to the semantics
routines then the semantics routines are just as visible to the
parser!


The real issue is that you need a plan to unitize the work which flows
to the emitters. That plan must be competent relative to the
requirements of each language. But the engagement of the emitters can
be as frequent as you like as long as you can hold outer context
information temporarily (higher in the AST, or partially in stack-like
data structures) while handing processible pieces of work to the
emitters.


Best Wishes,


Robert Rayhawk
RKRayhawk@aol.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.