Re: Reduced Machine Description (long)

harvard!cs.washington.edu!pardo (David Keppel)
25 Feb 89 18:13:41 GMT

From comp.compilers

Related articles
*Re: Reduced Machine Description (long) harvard!cs.washington.edu!pardo* (1989-02-25)**
Re: Reduced Machine Description (long) henry@zoo.toronto.edu (1989-03-06)

| List of all articles for this month |

From:	pardo@june.cs.washington.edu (David Keppel)
Newsgroups:	comp.compilers
Summary:	Long and opinionated. Contains at least one fact.
Keywords:	Portable compilers, Back ends
Date:	25 Feb 89 18:13:41 GMT
References:	<3365@ima.ima.isc.com>
From:	harvard!cs.washington.edu!pardo (David Keppel)
Organization:	U of Washington, Computer Science, Seattle

martins@rwthinf.uucp (Martin Schoenert) writes about:
>[RISC subset for easy retargeting]

I can read two proposals in to the posting:
* Define a common asm format.
* Define a common instruction set.

I think that *in general*, defining a common asm format isn't a huge
win, since the same retargetting information that is used by the
compiler can be used to build a retargetable assembler. Of course
this assumes that you're gonna have an assembler to retarget...

Using a RISC subset is generally a win. The moderator notes that some
architectures such as the 360 require certain operations to use an
even/odd register pair. Still, I've heard over and over (and over
and ... ) about a `RISC' port of an optimizing compiler to the '360,
and that it used only {y%|y<60} of the native instructions and (over
the test suite) produced very much faster code than the native
compiler.

On the other hand, DEC (I think) did studies on the VAX a few years
back that showed something like:

PROGRAM TYPE # of Instr. # using > 0.1% # using > 1.0 %
time or freq. time or freq
---------------------------------------------------------------
FORTRAN 88 68 29
COBOL 90 42 28
Linker 120 73 32
OS Kernel 145 112 37
OS Exec 138 77 42
Multiuser 194 111 33
6 Compilers 184 - >20
All Above 203 168 95

Note that while no *single* job category used more than about 40
instructions ``really heavily'', all job classes *together* made heavy
use of more than *double* that number of instructions. This
``specific application'' vs. ``general-purpose computing'' is one of
the oldest RISC vs. CISC debates.

RISC: ``We've looked at a bunch of programs, and they only really use
a few instructions heavily.''
CISC: ``Well you need to look at *all* program categories.''
RISC: ``Why? If it's really a problem, we'll build a new RISC.
Better to have 95% of your programs run faster.''
CISC: ``But you don't know apriori about your `customer base'. If you
only support your scientific programs, you'll be hosing the
DP world.''
RISC: ``...''
CISC: ``...''

The other major issue is that there is generally a large ``semantic
gap'' between programs and the target assembly language. The basic
problem of a ``semantic gap'' is that you write some high-level
operation that is then converted in to something inefficient, even
though the host machine may support it in hardware. This is
demonstrated at least two ways.

* The `strcpy' C function is supported directly in hardware by many
    machines. For most string move invocations, the function-call
    overhead is larger than the cost of the string copy, so it very
    nearly ``doesn't matter'' how you implement the strcpy() function.
    The general ``problem'' here is that `strcpy' isn't part of the
    *language*, so there's no way that the compiler can legitimately
    recognize it. You might write your own, error-checking strcpy, for
    instance. The ``correct'' way to do this in C is to #define strcpy
    in a machine- and compiler-specific #include, but nobody (that I
    know of...) does this.

    (Disclaimer: ``doesn't matter'' means ``doesn't matter whether you
    use the native strcpy instruction or a highly-optimized, hand-coded
    collection of other instructions.'' Also, note that I said
    ``*NEARLY* doesn't matter''.)

* There are (arguably) 4(?) different ways to implement the `mod'
    function/operator. While only one of them is ``mathematicaly
    symmetrical'' (according to some postings in another newsgroup :-),
    some of the others could be useful, while others could be faster
    as long as you are using only nonnegative numbers. This implies
    that you need a plethora of mod operators:

mod1 : Use the ``mathematically symmetrical'' method, no
matter how long it takes. We're gonna need to get it
right somehow.
mod2 : Use useful method #2, no matter how long it takes.
mod3 : Use the `native' mod operator, we'll never call this
with negative numbers.

    I know of no language that lets you do this. Instead, you have to
    take the language definition of mod and use it as best you can.
    For example, the recent proposed C standard says something like
    ``all bets are off when you use negative operands.'' While this is
    in the spirit of C, it leaves you writing code like:

if (y < 0)
z = MOD2(x,y);
else
z = x % y;

    EVEN WHEN THE UNDERLYING MACHINE spports the `correct' mod
    operation, you have to use the `cooked up' (slow) version, because
    you don't know that *all* machines will produce correct code.

So what does the ``semantic gap'' have to do with defining a RISC
subset of an architecture? A carefully-designed language could
portably let the compiler take care of recognizing some high level
instructions, so that the language could exploit the underlying
hardware. This is not to say that CISCs are justifed. Just that if
you have one, you *could* do a good job of putting all those
instructions to work.

Finally, a story. A few weeks ago, somebody asked me how the VAX
`INDEX' instruction works. I sat down with my book and an example and
convinced myself that I didn't know. I went and asked our local VAX
architecture guru. After I convinced him that I wasn't stupid, he
looked at the problem and said that he didn't know either. He talked
with somebody who had worked on the VAX architecture and had done a
lot of compiler work and *that* guy didn't know either. We then fired
up the VMS Pascal compiler, turned off optimization, turned on bounds
checking, and gave it a 3-d array reference. We expected it to use
the INDEX instruction three times.

INDEX <args> # compute first subscript
# first subscript bounds check
INDEX <args2> # compute second subscript
# second subscript bounds check
INDEX <args3> # compute third subscript
# third subscript bounds check

Instead, it used it once, and in my opinion did the wrong thing.

MULL <args> # compute first subscript
MULL <args2> # compute second subscript
INDEX <args3> # compute third subscript
# see if it is in the array *somewhere*.

The problem here is that you can decare an N by N by N array and give
it bogus parameters (e.g., 1, N+1, 1) and it will still think that the
reference is legal.

Finally, note that some (such as Wirth) believe that the penalty for
optimization is that you get big huge compilers that take forever to
write and produce buggy code. He argues that it is better to have
fast compilers that produce slow reliable code. If you need your code
to run 20% faster, buy a 20% faster machine.

;-D on ( So quick: find the fact! ) Pardo
--
pardo@cs.washington.edu
        {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Reduced Machine Description (long)

harvard!cs.washington.edu!pardo (David Keppel)25 Feb 89 18:13:41 GMT

harvard!cs.washington.edu!pardo (David Keppel)
25 Feb 89 18:13:41 GMT