|Re: Folk Theorem: Assemblers are superior to Compilers email@example.com (Mohd Hanafiah Abdullah) (1993-10-26)|
|Superoptimizer (was Re: Folk Theorem ...) firstname.lastname@example.org (1993-10-31)|
|Re: Superoptimizer (was Re: Folk Theorem ...) email@example.com (1993-11-01)|
|Re: Superoptimizer (was Re: Folk Theorem ...) synaptx!thymus!daveg@uunet.UU.NET (Dave Gillespie) (1993-11-02)|
|From:||Dave Gillespie <synaptx!thymus!daveg@uunet.UU.NET>|
|Keywords:||optimize, assembler, comment|
|Date:||Tue, 2 Nov 1993 08:13:16 GMT|
Michael Gschwind <uunet!vlsivie.tuwien.ac.at!mike> writes:
> There is a program called superoptimizer, which does the
> trick (for short sequences!) I cannot imagine that any human would come up
> with the same tricks that this program came up by searching all possible
> code combinations by executing them on the spot and comparing results.
I've looked at the GNU superoptimizer and heard of one or two others.
They're not the panacea they may sound like...
For one, superoptimizers can't "understand" the code they write. (They
have so many options to try, they can't afford the time to analyze the
code even if they wanted to!) The GNU superoptimizer feeds random inputs
through the generated code to see if the right outputs come out. This
technique is fine as a heuristic but it leaves the optimizer vulnerable to
boundary cases that don't "happen" to be tested. It's not safe to use a
superoptimizer without carefully reading its output, which starts to feel
more like assembly-language programming again rather than using a
Also, while a processor may have only, say, a couple dozen interesting
"instructions", it actually has a tremendous number of instruction words.
The "add" instruction might give you the choice of three registers
(32*32*32=32768 different things to try), or two registers and a 16-bit
immediate operand (32*32*65536=67,108,864 possibilities). A brute force
search is going to be totally swamped by this, and in fact if you look at
the GNU documentation they mention they only try a few simple immediate
operands like +1 and -1. If your clever trick involves the number 17, or
0xff, then you're out of luck.
Finally, optimal code sequences often involve loop unrolling on modern
processors. That souped-up i860 loop I mentioned before was 4-way
unrolled, each iteration with different registers and often ordered
slightly differently, in order to take advantage of the delays through the
floating-point pipelines. It's a total of 40 instructions long. Despite
arising from a fairly simple three-line loop in C, there's no way a brute
force search could find it in a human lifetime. (The unrolled loop also
involves immediate operands, in the form of load offsets, ranging from 0
all the way to 72).
Again, this sounds like a harsh test case, but it happened to be the loop
our application spent most of its time in.
The thing that superoptimizers *are* good at, though, is finding nifty
gems like "x = x & -x", the purpose of which is left as an exercise for
[Gentle readers: please don't send in what x & -x does. It's a clever but
well known hack. -John]
Return to the
Search the comp.compilers archives again.