Re: Adding other languages to GCC (was, using GCC for back-end)

burley@apple-gunkies.gnu.ai.mit.edu (Craig Burley)
Mon, 8 Feb 1993 18:44:31 GMT

          From comp.compilers

Related articles
GCC as back-end summary davids@ICSI.Berkeley.EDU (1993-01-29)
Adding other languages to GCC (was, using GCC for back-end) rms@gnu.ai.mit.edu] (1993-02-04)
Re: Adding other languages to GCC (was, using GCC for back-end) moss@cs.cmu.edu (1993-02-05)
Re: Adding other languages to GCC (was, using GCC for back-end) burley@apple-gunkies.gnu.ai.mit.edu (1993-02-08)
| List of all articles for this month |
Newsgroups: comp.compilers
From: burley@apple-gunkies.gnu.ai.mit.edu (Craig Burley)
Keywords: GCC, Fortran
Organization: Free Software Foundation 545 Tech Square Cambridge, MA 02139
References: 93-02-011 93-02-052
Date: Mon, 8 Feb 1993 18:44:31 GMT

moss@cs.cmu.edu (Eliot Moss) writes:


      The Modula-3 front end also generates trees. Because M3 allows arbitrary
      use before definition, the front end is actually two pass, to resolve
      identifier bindings properly. If other languages would benefit from such
      two pass processing in the front end, they can use our code as a model
      (once we release it :-) .... Eliot


The GNU Fortran (g77) front end isn't really designed as a two-pass front
end, because a model whereby compilation of FORTRAN 77 to assembler code
is possible in one pass is easy to design.


However, enter the gcc back end and the procedure calling interface, and
it made sense to add what amounts to a second pass to the g77 front end to
reduce the amount of redevelopment needed in the back end, so I did this.


What now happens is that while g77 is recognizing statements and building
its own internal trees describing the expressions in the statements,
instead of directly calling on the back end to emit the statements, it
saves the statements until it reaches the end of the program unit. At
that point, the end-transition action happens, which makes the "final
decisions" about what all the symbols actually are, and _then_ the list of
saved statements (and info on all the symbols) is revisited to call the
back end. In other words, the back end is unaware that anything is going
on until after the front end sees the END statement. This isn't ideal or
even strictly necessary, especially given that the back end currently
saves up all the RTL anyway, but it sure made it easier to get g77 up and
running without my needing to become an expert on the back end.


Before the change to two passes, made partway through the process of
integrating the front end with the back end (85% of the front end having
been coded by the time integration started, don't ask why :-), I'd already
written some code to revise the RTL in cases like this:


ASSIGN 10 TO I
...
10 FORMAT(...)


Most ASSIGN statements refer to labels for executable statements, but they
may refer to FORMAT statements instead, and it's the label definition that
chooses. The implementation at the assembler level can be made
independent of the label type, but with layers of back end code (where
info on whether a reference is to a procedure label or a variable is kept)
on top of the conceptual assembler level, g77 had to make a choice. When
g77 saw the ASSIGN statement above, it assumed that label 10 was
executable (as far as back-end stuff goes) and acted accordingly. Then,
when it saw the FORMAT, g77 had special code to revisit the RTL, search
for references to the label, and change them so they'd work for a FORMAT
label. (Needless to say, it annoyed me that front-end code had to muck
with the RTL, so I was happy to delete this code when changing g77 to call
on the back end only after a complete pass over the source program unit.)


If you're curious about what made 2-pass a necessity, here's a
straightforward example. g77 is designed (for now, anyway) to make object
files compatible with f2c+gcc. The procedure calling interface defined by
f2c passes CHARACTER variables by appending pass-by-value lengths to the
list of arguments for each CHARACTER variable in the calling sequence.


For example, given


CHARACTER A*5, B*10, C*15
INTEGER I(10)
CALL FOO (A, I, B, C)


FOO is called like this (in C):


FOO (&A, &I, &B, &C, 5, 10, 15);


That is, the lengths for A, B, and C are passed to FOO. Now, let's look
at the first several lines of a sample subroutine:


SUBROUTINE BAR (A, I, B, C)
CHARACTE A*(*), B*10, C*(*)
INTEGER I(10)
C
I(1) = 5


At this point, if g77 was 1-pass, it'd need to tell the back end about
function BAR so it could start emitting executable code (for the I(1) = 5
line), but what calling sequence would it give for BAR? Answer: it
couldn't give a definitive calling sequence, because if, say, the next
executable line is


C = A // B


then the calling sequence for BAR would be the same as for FOO in the
previous example (though B's length argument would be ignored, since BAR
declares its own length for B), whereas if the next executable line is


C = A // B()


then B is a CHARACTER _function_, not _variable_, and therefore no length
parameter for B is passed, meaning BAR has, in C terms, six arguments
instead of seven. (Of course, the caller has to know this too, but that's
not important here.)


There are other examples like this, some real hairy (how to handle
alternate ENTRY points in one pass, especially given the facts that
discovery of alternate entries can happen long into executable code and
that the back end doesn't understand alternate entries yet) and some just
annoying to handle properly in the back end (like how variable/ array
"static" initialization can be the last thing prior to an END statement,
and after lots of uses of the variable/array).


So while most of the g77 front end still is designed for one-pass use, the
parts that interface to the back end are build-time switchable between
one/two pass, and for building g77 itself, two passes always are used.
(The front end is designed to be fairly independent of the back end used,
or even of whether it is part of a compiler vs., say, part of a
source-code checker or transformer. It doesn't save whitespace info in
its internal trees, so it's not completely flexible. :-)


A truly language-independent back end would not have these problems, I
guess, and the gcc back end definitely was designed from C's point of
view. But it was nowhere near as hard to interface and otherwise deal
with as I expected, based on my other experiences with compiler internals,
and certainly the source code didn't cost me anything (plug, plug :-).
--
James Craig Burley, Software Craftsperson burley@gnu.ai.mit.edu
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.