From:  Jerry Leichter <leichter@smarts.com> 
Newsgroups:  comp.compilers,comp.lang.asm.x86 
Date:  9 Jul 1997 23:18:04 0400 
Organization:  System Management ARTS 
References:  9706071 9706082 9706110 9706140 9707023 
Keywords:  optimize, architecture 
 Here's what's wrong with the superoptimizer:

 (1) As the title above suggests, the SO produces the *shortest*
 possible program. This is not always the same thing as the
 *fastest* possible program.
The superoptimizer does a fairly dumb choice over all possible code
sequences and chooses the one that is "best" according to some measure.
The published example uses "shortest number of bytes" to define "best".
If, given a code sequence, you can determine how long it takes to
execute, you can trivially modify the superoptimizer to find the code
sequence that runs fastest.
For most widelyused machines, execution models that will give you exact
timing measurements are available. For typical RISC machines, the
models depend more on cache interactions than details of the instruc
tions used; for CISC machines, instruction timing can vary widely. In
either case, even experienced assembly language programmers are forced
to use heuristics  the interactions are just too complex. (In some
realtime applications, where *exact* timing information is needed, the
only alternative is to stick to simple machines and simple memory
architectures. The result is that the average performance is much less
than could be attained by more aggressive design  but the average case
is also the worst case, and it's the worst case that's the concern.)
If, for some odd reason, you wanted the program that had the fewest
possible number of 1 bits in its machinecode representation, the super
optimizer could find that for you, too.
 (2) Optimization is provably NPComplete and that's how the SO runs.
 In the example above, the sample function they computed required
 40 hours to compile to three machine instructions.... True
 optimization is an intractible problem.
It's intractable for humans, too. So what? In few cases do we really
care to find the *best possible* program; we typically just want to get
some large improvements. Computers doing dumb searches can do that.
Computers using more clever searches can do better. (As I recall, the
superoptimizer didn't really try to be clever.)
In any case, NPcompleteness is a statement about assymtotics. For a
fixed size input and any reasonable measure of "best program", the
computation tree is bounded. It's large, but so what?
Computers using moderately intelligent brute force play chess (at the
moment) better than any human being. Do they play "optimal" chess?
Highly unlikely  they certainly aren't *trying*.
If the goal, as in the superoptimizer, is to find the optimal solution,
then, yes, the costs are impractically high for all but small examples.
Then again, humans would have trouble consistently finding the optimal
solution even for small examples!
If the goal is to find *very good* solutions for moderately small pieces
of code (where a "solution" is defined strictly in terms of inputs and
outputs, and "moderately small" these days is probably "a few tens of
instructions in the obvious implementation"), then it would be quite
straightforward to write programs that beat all humans most of the time,
and almost all humans all of the time. The reason such programs aren't
used is economic and psychological: People are usually unwilling to run
compilers for tens of hours, but they are willing to pay programmers to
work for weeks on similar pieces of code. One of these days, that will
change. (People don't do designrule checking for IC's any more  they
let machines run for hours, days, even weeks doing that. It's not much
different.)
 ...BTW, shortly after the article appeared (discussing the "signum"
 function, as I recall). I rewrote the signum function in a different
 fashion and it only took me about 18 hours to come up with my solution
 (which was just as good, but obviously not better than, the
 superoptimizer's).
Since the article was written in 1987, readilyavailable hardware has
gotten at least 100 times faster. The same program would now run in
about 25 minutes. Has your brain sped up to match? :)
 Jerry

Return to the
comp.compilers page.
Search the
comp.compilers archives again.