GCC as back-end summary

davids@ICSI.Berkeley.EDU (David Petrie Stoutamire)
Fri, 29 Jan 1993 19:41:05 GMT

          From comp.compilers

Related articles
GCC as back-end summary davids@ICSI.Berkeley.EDU (1993-01-29)
GCC + Garbage Collect code = new GCC? Whitten@Fwva.Saic.Com (1993-02-02)
Re: GCC + Garbage Collect code = new GCC? moss@cs.cmu.edu (1993-02-03)
Adding other languages to GCC (was, using GCC for back-end) rms@gnu.ai.mit.edu] (1993-02-04)
Re: Adding other languages to GCC (was, using GCC for back-end) moss@cs.cmu.edu (1993-02-05)
re: Adding other languages to GCC David.Chase@Eng.Sun.COM (1993-02-05)
Re: Adding other languages to GCC moss@cs.cmu.edu (1993-02-07)
[1 later articles]
| List of all articles for this month |

Newsgroups: comp.compilers
From: davids@ICSI.Berkeley.EDU (David Petrie Stoutamire)
Keywords: GCC
Organization: International Computer Science Institute, Berkeley, CA, U.S.A.
Date: Fri, 29 Jan 1993 19:41:05 GMT

This is a summary of responses to my query about using gcc as a
back-end for our new Sather compiler. Eight other people responded
asking to be forwarded any info, so I'm posting this. These messages
are trimmed a bit.


        - Dave


        We are about to write the compiler for the next release of Sather. Our
        previous compiler (written for the most part in Sather) used Yacc and Lex,
        did some Sather processing, and output in C as a sort of portable
        assembly.


        Most of the compiling time is currently spent compiling the resultant C
        code. Naturally, we are thinking about using RTL as an intermediate form
        to hand to gcc. This would result in a much faster compiler that would be
        nearly as portable, although we would still have to be able to generate C
        in order to let people bootstrap Sather.


        RTL (from the gcc manual) isn't too bad, but removing the C-specific part
        of gcc and linking in our code looks like it would require a lot of
        knowledge of gcc. The new versions of the compiler look much cleaner than
        before, but we would rather be spending time on language issues than
        mucking with the code.


        I would love to hear from anyone who has used gcc as a back-end, or
        pointers to other attempts to do so. Ideal would be a stripped down
        version which read RTL instead of C files. I imagine such a thing would
        be very useful to others as well.


        I will appreciate any help... thanks!
-----------------------------------------------------------------
>From comp.compilers moderator:
[The GNU Fortran project must have faced a similar problem. It's headed up
by Craig Burley <burley@apple-gunkies.gnu.ai.mit.edu>. -John]
-----------------------------------------------------------------
>From matthes@src.dec.com Wed Jan 20 13:15:35 1993


Our group at Hamburg University, Germany, also uses C as a "portable"
target assembly language for the compilation of a higher-order persistent
language. Based on our previous experience in generating code for the
commercial Sun optimizing compiler back end shared by their C, C++,
Modula-2, Pascal and Fortran compiler, we decided to stay away from
intermediate representations a la RTL.


Nevertheless, I would be very interested to learn more about the
experience of other groups in this field. Perhaps there has been some
progress to truely decouple the frontends from the bits and bytes of the
IR. I would therefore appreciate if you could forward (a summary of) the
information you receive.
-----------------------------------------------------------------
>From jcook@goober.cs.colorado.edu Wed Jan 20 13:27:23 1993


I tried exactly what you're describing back in '89, when I was taking a
compilers class. I spent spring break (all of it) trying to separate the
RTL back end, and ended up failing - the linking dependencies killed it. I
then did my own RTL'ish intermediate language and the back-end to 68000
code, but it kept me from having the time to add more functionality to the
front end.


I think having the RTL translator available separately would be great, and
am surprised no-one has done it yet. Best of luck :-).
-----------------------------------------------------------------
>From bondp@source.asset.com Thu Jan 21 07:06:41 1993


I think is is exactly what the GNU ADA (GNAT) project is doing. Check
with Ed Schonberg (schonber@SCHONBERG.CS.NYU.EDU) for more information.
-----------------------------------------------------------------
>From perle@cs.tu-berlin.de Thu Jan 21 10:53:48 1993


We are investigating the same thing for the (locally developed)
applicative programming language OPAL, the main aspect of whose back - or
rather middle - end is compile time optimization of garbage collection.


The current solution is, as you did, to generate C source code to be
further handled by the Gnu C Compiler. And of course, to skip the step of
flattening an abstract tree to text form and then having it parsed again
should be a significant speedup.


A group of of three students, including me, was assigned the task of
finding, analyzing and reporting possible interfacing methods.


This is what we found out so far:


At the initial moment of their generation during the parsing process (i.e.
before further optimizations take place), RTL expressions are restricted
to machine specific formulations of the intended semantic action. For
example, the code generated for the SPARC architecture for loading an
immediate value of sufficient size (= not fitting in a single opcode) into
a (at that time, virtual) register will consist of code for loading the
more significant bits and then adding the rest, because this is the way
the SPARC will do it.


So, generating your own RTL expressions would mean to implement all those
machine specific decisions, unless you make use of gcc's RTL generating
functions. These get adapted to the machine dependent needs at the time of
compiling gcc itself. Most of them, unfortunately, do expect appropriate
variants of gcc's tree data structure to be guided in what functionality
to generate. Gcc constructs partial syntax trees per top level definition
(e.g. function) from the sourcecode, then generates RTL code from it and
discards the tree. It seems there is no routine included for reading in
either RTL or trees.


Luckily, as far as we can tell gcc accesses the tree objects only via some
#define macros from tree.h . Therefore, it should be possible to construct
a new tree.h which makes gcc believe that our own abstract "imperative
code" tree is just what it expects. Of course, this would also require to
appropriately link the executables together, including control flow.
Also, the implications of gcc's destructive access to otherwise
functionally clean data structures must be considered. If we have to copy
them to avoid problems, we could even adapt them to the expected form.


The effort to implement this scheme should not be underestimated.


There are quite many possibilities to draw the border line. A clean and
easy solution would be to define a functional datatype which resembles the
tree structure logically (maybe leaving out relative obscurities like the
++ operation etc.), and write a new .y grammar description file for gcc
which parses a simple prefix notation of the tree, which can be easily
output from the functional parser back end. The parsing function could
(later on?) be hand coded to avoid large tables.


One could even try to translate the RTL generating functions from C to the
functional language of your choice - either manually, or better yet
automatically.


Any way, a thorough investigation of struct tree, or rather its macro
interface and an exact documentation of what we found should be the next
thing to do.


It is not sure whether our project will be continued to reach any
implemenation, for the prospectable amount of work which is required
exceeds the limitations of the course this project is part of.
-----------------------------------------------------------------
>From thiemann@informatik.uni-tuebingen.de Fri Jan 22 01:54:28 1993


try to contact J"orn von Holten aka <aspect@sun1d.informatik.Uni-Bremen.DE>.
They are implementing a strict functional language named aspect, have been
generating C code and are switching to using RTL, as I'm told.
-----------------------------------------------------------------
>From moss@CRAFTY.FOX.CS.CMU.EDU Fri Jan 22 07:43:57 1993


You can try to GNU Ada people at NYU. One contact is Professor Ed
Schonberg (schonber@cs.nyu.edu (note spelling of user id!)). They have an
Ada front end that output RTL to the gcc back end. I lead the GNU Modula-3
project, which wrote a new front-end, but it's more "in bed" with gcc's
way of doing things.
-----------------------------------------------------------------
        This is a message sent to our internal Sather group which is related,
        so I'm including it in the summary (we have garbage collection in
        Sather).


>From CRAFTY.FOX.CS.CMU.EDU!moss Fri Jan 22 20:38:22 1993


Thanks for your inquiry concerning our gc enhancements. I don't have a lot
of time at the moment, but I should be able to give you the general
picture.


One of my students, Amer Diwan, enhanced the gcc back-end to keep track of
pointers, address arithmetic optimizations, etc., and to output tables to
support gc. The idea is that for selected execution points where gc can
occur (e.g., at calls, allocations, etc., but, to keep table size down,
etc., NOT at every instruction) the table allows one to determine which
registers have live pointers to objects, and which stack locations in the
current frame. By using the return pc values into previous frames, we can
probe the table for other frames, and reconstruct the contents of
registers, and thus "decode" the entire stack.


Additionally, one may have what we call "derived values", which are based
on either pointers to (the starts of) objects or other derived values. To
perform a gc, one "subtracts off" the values from which a derived value is
derived, does the gc (possibly moving objects), and then "adds back" the
values back into the derived value, thus adjusting it. This works for
derived values that are the sum or difference of one or more pointers,
plus any additional offset (the offset need not be a constant). The
derived value stuff handles strength reduction topimizations (which
produce pointers that step through arrays), and a variety of other
optimizations that result in derived values that do NOT point anywhere
into the object(s) to which they are connected. We also guarantee the base
pointers are still live and available for this computation at the gc
points, etc. Finally, the scheme does significant compression on the
tables. This is all reported in our paper presented at SIGPLAN '92, which
you can obtain via anonymous ftp from ibis.cs.umass.edu (look in
~pub/papers for interesting stuff).


There are two other things needed to support gc. One is a way of informing
the collector of stores of pointers. This is needed because the collector
is generational, and need to know about new pointers from old generations
into younger ones. The compiler has support for a number of schemes for
doing that, selectable by command line options. Finally, one needs the
actual colelctor and the stack decoding run time. Both of these have been
built. The collector we use is the UMass Language Independent GC Toolkit.
There is a technical report describing an earlier version of this
collector in the aforementioned pub/papers directory. We can make the
toolkit available for research use only; commercial use may require
licensing. The directly gcc related stuff will be offered back to the FSF,
though we have not handed it out yet, since GNU M3 is not even into the
alpha release stage (though that is only a matter of weeks now).
-----------------------------------------------------------------
>From hudson@yough.ucc.umass.edu Fri Jan 22 16:14:33 1993


I think it might be worth measuring how much time is actually spent in
these routines. Optimizing takes up a lot of time and the gcc front end is
designed to be pretty fast (at the expense of being 1 pass). You might end
up with less speedup that you anticipate.


> RTL (from the gcc manual) isn't too bad, but removing the C-specific part
> of gcc and linking in our code looks like it would require a lot of
> knowledge of gcc. The new versions of the compiler look much cleaner than
> before, but we would rather be spending time on language issues than
> mucking with the code.


The dream of us all, but the reality of getting a Modula-3 compiler that
produces quality code has forced us to redo the frontend of gcc and add to
the backend. We changed the front end to build a parse tree using an
enhanced version of the tree.h nodes. From there we type checked and
generated rtl that included the type information we needed and off we
went. Going to our internal parse tree nodes might be a another approach
since it buys into some of our backend type work with greater ease. The
down side of our approach is that we ended up hacking in C instead of
Modula-3 (or Sather). To understand some of the work we did see


@InProceedings{DMH91,
    author = "Amer Diwan and J. Eliot B. Moss and Richard L. Hudson",
    title = "Compiler Support for Garbage Collection in a Statically
    Typed Language",
    pages = "273--282",
    booktitle = "Conference on Programming Language Design and Implementation",
    year = 1992,
    organization = "SIGPLAN",
    publisher = "ACM Press",
    address = "San Francisco, California",
    month = "June"
}


If Sather needs a GC you should at least look at this paper. As for timing
we are still several months away from having gm3 ported to a MIPS or Alpha
machine so you might just keep us in mind and talk to us this summer to
see where we are. Good luck. Eliot (Moss@cs.umass.edu) is also working on
this project.
-----------------------------------------------------------------
>From muller@src.dec.com Thu Jan 28 14:09:09 1993


We are working on a Modula-3 compiler that uses gcc as its back-end. Not
everything is working but we have made good progress so far, and here is a
summary.


First, you probably don't want to generate RTL directly. If you look at
tree.def, you will see that there is another representation of the code,
let's call it the tree representation. The parsers generate tree
representations that are essentially machine and language-independant.
The code that transforms this tree representation to RTL knows a lot about
the machine for which you are compiling; for example it knows the calling
conventions, such are r4 to r7 contains the first 4 words of integer type
actual arguments on mips.


Unfortunately, the tree representation is very poorly documented; in
particular, the collection of functions you need to call is not very
clear. Some of the node types are not implemented, and some of them are
still language dependant (for example, integer addition does not check for
overflow).


Since gcc was not the obvious choice of back-end for us, we have written
our own back-end interface. Our hope was to be able to implement that
interface in various ways to satisfy our needs. One of the goals was that
it should be possible to implement a naive code generator easily; this
means that the interface is very low level; for example, frame pointers
for nested procedures are manipulated explicitly.


The implementation of our interface consists mostly into making the calls
that the C parser would make if presented with a C program that does the
same as the Modula-3 program we are compiling. We save cpp, the parsing,
a fair amount of checking.


So far, we have made only one small modification in the code that
generates the RTL (which we don't consider to be ours) to support nested
procedures. We expect more extensive modifications to handle correctly
exceptions.


I have included our interface below (it's actually in two interfaces, plus
one machine specific interface). It will probably be modified somewhat
before we have a running compiler, and it probably needs a lot of
documentation. [text not included in this summary - Dave]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.