Related articles |
---|
WANTED: Rating of common optimization routines. napi@ms.mimos.my (1996-01-12) |
Re: WANTED: Rating of common optimization routines. dlmoore@ix.netcom.com (1996-01-13) |
Re: WANTED: Rating of common optimization routines. cliffc@ami.sps.mot.com (1996-01-15) |
Re: WANTED: Rating of common optimization routines. bill@amber.ssd.hcsc.com (1996-01-15) |
Re: WANTED: Rating of common optimization routines. jgj@ssd.hcsc.com (1996-01-16) |
Re: WANTED: Rating of common optimization routines. rubin@scrugs.amt.tay1.dec.com (1996-01-17) |
From: | cliffc@ami.sps.mot.com (Cliff Click) |
Newsgroups: | comp.compilers |
Date: | 15 Jan 1996 21:00:25 -0500 |
Organization: | none |
References: | 96-01-010 |
Keywords: | optimize |
napi@ms.mimos.my (Mohd Hanafiah Abdullah) writes:
> I wonder if there's any published report on the commonly employed
> optimization routines/procedures targeted for scalar-type RISC based
> applications such as:
>
> o register allocation
> o copy propagation
> o common subexpression optimization
> o peephole optimization
> o strength reduction
> o code motion of loop invariants
> o ...
>
> that is, regarding each one's contribution to the percentage of
> improvement on the target execution speed when turned on. It would be
> interesting to know which ones provides the highest improvement so
> that emphasis can be put on them. I tend to think in general that
> register allocation rates the highest, followed by peephole
> optimization and strength reduction.
Wow! If only things were that simple!
*EVERYTHING* interacts with everything else.
That is, the CPU/memory-subsystem/I-O system/OS/application ALL
interact with with above optimizations, which (of course) all
interact with each other. These interactions make compilers fun...
Fer instance: you are optimizing the inner kernal of a matrix multiply
on a modern RISC. Your compiler (at first) does NO optimizations,
just lives with naive code producde by the parser. You decide to do
strength reduction - some addressing multiplies become adds or shifts.
Did you dead-code-eliminate the adds? Your loop speeds up some, but
your loop is still full of loop invariant and dead code. Suppose you
see a 2% gain.
Instead you try hoisting loop invariants (no DCE yet). Maybe you get
10%. A lot of multiplies get hoisted. Now you strength reduce again,
but the multiplies got hoisted - little gain from SR.
Ok, start with CSE. Lots of common addressing code, so some 10% gain.
Cumulate it with invarient hoisting - what happens? A bunch of CSE's
get hoisted, so the total gain isn't 20% - except that the CSE's that
stay in the inner loop represent a larger fraction of dynamically
execution instructions. Whammo - another big gain.
You look at the inner loop, and see a darned integer multiply - after
hoisting and CSE the inner loop is getting kinda sparse. NOW that
multiply represents a large fraction of all executed instructions.
You try SR again and *bing* suddenly get another 20%.
In short, optimizations interact. Applied standalone you'll get one
number (perhaps a disappointing one). Applied in combinations, the
same optimization can be crucial in getting a 2x speedup.
Cliff
--
Cliff Click Compiler Researcher & Designer
RISC Software, Motorola PowerPC Compilers
cliffc@risc.sps.mot.com (512) 891-7240
http://members.aol.com/mjclick1
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.