RE: Two questions about compiler design

"Tom Linden" <tom@kednos.com>
2 Mar 2004 11:03:34 -0500

          From comp.compilers

Related articles
RE: Two questions about compiler design tom@kednos.com (Tom Linden) (2004-02-26)
Re: Two questions about compiler design cfc@shell01.TheWorld.com (Chris F Clark) (2004-02-27)
RE: Two questions about compiler design tom@kednos.com (Tom Linden) (2004-03-02)
Re: Two questions about compiler design cfc@shell01.TheWorld.com (Chris F Clark) (2004-03-06)
| List of all articles for this month |
From: "Tom Linden" <tom@kednos.com>
Newsgroups: comp.compilers
Date: 2 Mar 2004 11:03:34 -0500
Organization: Compilers Central
References: 04-02-170
Keywords: design
Posted-Date: 02 Mar 2004 11:03:34 EST

    Tom Linden wrote:
    > Between the two came the n-tuple design that Freiburghouse developed
    > for PL/I which was widely used for a number of languages by Digital,
    > Wang, Prime, DG, Honeywell, CDC, Stratus and others I can't recall.


    Ah, fond memories rekindled. Freiburghouse's IL was fairly close to a
    useful UNCOL. At Prime, we had frontends for it for PL/I (at least 3
    dialects), Fortran, Cobol, Pascal, Basic, Modula-2, and RPG that were
    shipped to customers and supported. In house, Kevin Cummings wrote an
    Algol-60 frontend as a fun project. I'm pretty sure a C frontend was
    written also, but was not the compiler that got shipped to customers.
    The best part was that most of the backend and scaffolding for all
    those projects was common.


Prime's C front-end was written by Conboy and did indeed conform to
the PL/I IL and Symbol table design. I no longer have the souces
around in electronic form, but I found a while back a listing of the
Imode code gen for the 50 series. In fact the code gen I did for the
Alpha was a distant offshoot of this.


    At Prime we even built our own improved global optimizer for the IL.
    We started to build a third version of the global optimizer based on
    Fred Chow's Stanford thesis, but that fell prey to the second system
    syndrome and died a death due to management not wanting to fund the
    project unless it had no risks and the project engineers adding things
    to reduce the risk, but never willing to say that there were none. (I
    left when the project plan was over 100 pages with no end in sight.)


Ah, oprimization. Interesting from several points of view. When
Cutler et. al. did the VAX code gen they basically followed Aho,
Sethi and Ullman. But the interesting thing is that you can get 80%
of the optimizations with a rather simple optimizer, at a fraction of
the cost to develop - and maintain.


    In addition to the hardware vendors Tom listed above, I know two
    compiler houses made a good living off the technology, TSI and LPI.
    Later in my career I did a stint with LPI.


Well actually, I don't know if you would call it a good living, but
TSI morphed into Kednos which today only does PL/I for VAX and Alpha
and next year Itanium.


    Having mentioned Fred Chow, it is worth tying this back to P-code.
    His thesis used a variant of P-code with register information that he
    called U-code. There were several different U-code frontends built
    also. I recall C and Fortran at DEC. I was there when they were
    retiring the U-code backend, replacing it with the GEM backend.


I believe the Ucode is what Hennessey used for the MIPS compilers.
Interestingly, most of the GEM staff came out of Digital's PL/I group
which had develpoed VCG. There focus was largely C so they eliminated
frame pointers, which stack unwinfing a bit of a chore and indeed
required one to use undocumented features.


    Having experienced both, I think the U-code IL was better for many
    compiler purposes, but not as good an UNCOL. The global optimizer
    technology associated with U-code was certainly better, both simpler
    to maintain and more sophisticated. In contrast, I think the
    Freiburghouse code generator technology was better, especially from an
    easy of maintenance standpoint. Part of this was due to the fact the
    the Freiburghouse IL was not as close to the machine and let more
    frontend semantics peer through. For example, when looking at a
    "reference" to a variable in the Freiburghouse IL, one had to know
    which frontend produced the reference for certain aspects of its
    semantics--that made it a better UNCOL, becase the frontend didn't
    have to bend so much to match some other languages memory model.


Yes, the IL was much like an abstract, overloaded assembly language.
In fact, I, as an experiment, generated in the semantic pass of PL/I
IL code for bultin functions essentially writing the algorithms in IL
to be inlined rather than as a library call and this turned out to be
very simple and almost self documenting


    The distance from the machine helped at code generation time, because
    each IL operator stood for something more or less complete and one
    could then factor the cases as code generation time. There was a
    simple but useful "language" used by the code generators to implement
    those semantics. That made all the difference. It made the code
    generator into something one could read easily and understand the code
    sequences coming out.


    In contrast, with the U-code IL, semantics were composed from more
    primitive operations and code generation worked by matching
    patterns--think BURG. While one can express all the same decisions
    that way and perhaps more, it is much less clear to a code generator
    writer how a small change will affect the code generated. Of course,
    exposing all of those details in the IL was part of what made the
    U-code optimizer good. The optimizer could easily rewrite unncessary
    operations out of the program, because the operations were all exposed
    in the IL. In the Freiburghouse IL, many of those things were more
    implicit and thus inaccessible to the optimizer.


    Perhaps the most striking thing about that difference is how it was
    actually localized to a small part of the different IL's. If one
    looked at the opcode listings for both, they would be mostly
    identical. The key difference being in the memory access opcodes,
    Freiburghouse's IL had only a generic "reference" operation, where
    U-code has explicit load and store operations that used explicit
    arithmetic to calculate the exact memory location. However, that
    semantic difference ripples through the IL and completely changes
    everything. One could easily implement a Freiburghouse on nearly any
    architecture, memory-to-memory, a few registers, many registers, stack
    based, byte addressible, word addressible--the IL was architecure
    neutral. In constrast, U-code was optimized toward many register
    machines with a load-store architecture (with a preference toward byte
    addressible machines). If your machine doesn't look like that, U-code
    isn't quite as useful.


The ref operator was specifically designed with a view in mind to
efficiently reference a member of an indexed, based strucuture, Ucode
did not have that capability. This also helped with Fortran common.


    Of course, in the current time, we seem to have a paucity of different
    architectures, but that won't last forever, I hope. And, that brings
    up the final and perhaps most important point.


What gave rise to all the different architectres was essentially the
AMD 2901 bit slice chips which made it possible to rather easily put
together a microcode architecture. The latest Xeon I believe has 125
million transistors each of which is about the sice of the DNA
molecule! It takes billions to create a plant that can produce those
kinds of chips, so I think diversity is not part of the future.


    Freiburghouse designed his IL in a time when many architectures were
    around and none were prevelant. Making his IL into an UNCOL,
    particularly for different underlying machines was important. For him
    investing a few more hours into developing a code generator for a new
    machine/language combination actually meant money in his pocket, so he
    wanted that task simple enough that he could effectively do it. The
    ability to optimize the code on those machines was relevant but the
    opportunities were more limited.


He actually formed most of his ideas while at Multics.


    The U-code architecture reflects the great shift to more of an
    atchitectural monoculture. Different ports were not as different (at
    least in terms of basic machine semantics) and getting a more relevant
    was achieving optimized results on the machines which were becoming
    dominant, machines whose architecture is designed for C-like
    languages, byte addressible uniform memory access and a reasonable
    sized register file.


    Well, I've rambled on enough on this topic. I hope something in here
    was of interest....


Me too.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.