Re: Folk Theorem: Assemblers are superior to Compilers

Dave Gillespie <synaptx!thymus!daveg@uunet.UU.NET>
Wed, 27 Oct 1993 21:43:56 GMT

          From comp.compilers

Related articles
[3 earlier articles]
Folk Theorem: Assemblers are superior to Compilers ssimmons@convex.com (1993-10-27)
Re: Folk Theorem: Assemblers are superior to Compilers vick@wotangate.sc.ti.com (1993-10-27)
Re: Folk Theorem: Assemblers are superior to Compilers leichter@thorium.rutgers.edu (1993-10-27)
Re: Folk Theorem: Assemblers are superior to Compilers cliffc@rice.edu (1993-10-27)
Re: Folk Theorem: Assemblers are superior to Compilers macrakis@osf.org (1993-10-27)
Re: Folk Theorem: Assemblers are superior to Compilers amn@ubik.demon.co.uk (1993-10-27)
Re: Folk Theorem: Assemblers are superior to Compilers synaptx!thymus!daveg@uunet.UU.NET (Dave Gillespie) (1993-10-27)
Re: Folk Theorem: Assemblers are superior to Compilers winikoff@munta.cs.mu.OZ.AU (1993-10-28)
Re: Folk Theorem: Assemblers are superior to Compilers prener@watson.ibm.com (1993-10-28)
Folk Theorem: Assemblers are superior to Compilers Mark_Prince@gec-epl.co.uk (1993-10-28)
Re: Folk Theorem: Assemblers are superior to Compilers mps@dent.uchicago.edu (1993-10-28)
Re: Folk Theorem: Assemblers are superior to Compilers toon@moene.indiv.nluug.nl (1993-10-28)
Re: Folk Theorem: Assemblers are superior to Compilers raymondc@microsoft.com (1993-10-28)
[19 later articles]
| List of all articles for this month |
Newsgroups: comp.compilers
From: Dave Gillespie <synaptx!thymus!daveg@uunet.UU.NET>
Keywords: assembler, optimize, performance
Organization: Compilers Central
References: 93-10-114
Date: Wed, 27 Oct 1993 21:43:56 GMT

This thread seems short on concrete examples, so here are a few from my
personal experience:


* There will always be things no compiler writer will think of.


My favorite example comes from an i860 loop we wrote here, which involved
a bunch of multipy-adds followed by something of the form max(min(x,
limit), -limit), i.e., constrain abs(x) <= limit. Branches are awful on
the i860, but we found a way to use the graphics unit's Z-buffer check
instruction to our advantage. This instruction works in parallel on two
32-bit Z values in adjacent registers, computing x = min(x, limit) using
unsigned integer arithmetic. Some reflection on the IEEE floating-point
format shows that an unsigned integer comparison is the same as a
magnitude comparison of two floats with equal signs. We had cycles to
spare in the integer unit, so we set it to grafting sign bits from x's
onto the limit value, while the FP/graphics side was executing Z-check
instructions.


This loop happened to account for 80-90% of the total runtime of our
program, so it was worth spending several man-days to figure this trick
out. I would be mighty impressed to see any compiler, even a brute-force
superoptimizer, discover this trick.


* There will always be things a high-level language can't easily express.


Drawing again from the fertile ground of i860 programming :-), one of the
keys to getting good performance on that chip is taking advantage of its
huge memory bandwidth. It can load or store a quadword (four IEEE single
floats) in a single clock cycle. BUT... it can only do that if the
address is quadword aligned. How many languages have a declaration that
tells the compiler that a given pointer, or even a given integer, is a
multiple of 16?


(In C, you could make this particular declaration by enclosing your code
in "if (((int)p & 15) == 0) { ...}". Does anyone know of any compiler
that would actually use this hint? It's not hard to think of subtler
hints that would be hopeless to communicate to the compiler.)


* One big disadvantage of assembly language is the cost of program
maintenance.


As a modest example, suppose you have delayed branches (and you don't
cheat and make your assembler fill delay slots for you). It's not hard
for the human programmer to figure out how to use the delay slots, but it
adds hidden redundancy to the code. Say you've got the sequence:


L1: add 42,a
ld 16(b),c
...
bra L1
nop
...
bra L1
nop


It's trivial to modify this loop to:


L1: add 42,a
L2: ld 16(b),c
...
bra L2
add 42,a
...
bra L2
add 42,a


but now the instruction "add 42,a" has secretly spread itself throughout
the rest of the code. If you want to change it later, it's a giant
nightmare.


The nice thing about a compiler is that it is willing to "rethink" all
these optimizations every time you change the code. Humans don't have the
time or patience to do this well, so instead they write sub-optimal code
for the sake of maintainability.


-- Dave
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.