Re: Books and other things which cost money to improve performance

George Neuner <>
Sun, 11 Jul 2010 00:58:00 -0400

          From comp.compilers

Related articles
Books and other things which cost money to improve performance (Colin Paul Gloster) (2010-07-05)
Re: Books and other things which cost money to improve performance (Hans-Peter Diettrich) (2010-07-06)
Re: Books and other things which cost money to improve performance (George Neuner) (2010-07-06)
Re: Books and other things which cost money to improve performance (Paul Colin Gloster) (2010-07-09)
Re: Books and other things which cost money to improve performance (Matthias-Christian Ott) (2010-07-10)
Re: Books and other things which cost money to improve performance (George Neuner) (2010-07-11)
Re: Books and other things which cost money to improve performance (Paul Colin Gloster) (2010-08-31)
| List of all articles for this month |

From: George Neuner <>
Newsgroups: comp.compilers
Date: Sun, 11 Jul 2010 00:58:00 -0400
Organization: A noiseless patient Spider
References: 10-07-009 10-07-011
Keywords: optimize
Posted-Date: 11 Jul 2010 12:27:59 EDT

On Fri, 9 Jul 2010 10:05:30 -0400 (EDT), Paul Colin Gloster
<> wrote:

>On Tue, 6 Jul 2010, George Neuner sent an impressively large and
>excellent response:
>| - eliminating redundant or unnecessary calculations,
>| - inlining leaf functions,
>It had been claimed that these can actually slow down the code, but I
>could probably say that about anything.

By "redundant" and "unnecessary", I mean mostly loop invariant
expressions. A decent programmer isn't likely to deliberately code
such things, but the compiler itself can introduce them by failing to
factor complex address calculations (particularly during loop fusion
or unrolling), while inlining or doing things like closure creation
and/or closure conversion.

Inlining or tail-calling leaf functions eliminates state save/restore
associated with call/return, saves frame setup, and gives the
compiler's instruction scheduler more to work with. It is almost
always a performance win.

>One thing which has not been clear to me (one of the many things which
>I would like to check some time) is whether each core of the Intel(R)
>Core(TM)2 Quad CPU Q9450 which I use has its own L1 cache (probably,
>actually I may have checked the L1 already, but if so I have
>forgotten); and its own L2 cache? Similarly in the case of an L3 cache
>of an Intel Xeon (not that I use one)?

Q9450? Each core has its own L1 cache, but there are only two L2
caches each of which is shared by a pair of cores.

>|">Using C++ was really not a good idea. Expect to see a version with
>|>a lot more Ada on the Internet later this year.
>|YMMV but I think you are naive to expect a whole lot of performance
>|improvement to come simply from switching to Ada."
>I am already seeing great improvements, but of course I want more.
>|"Once you step
>|outside the restricted real-time subset, Ada and C++ provide equal
>|opportunity for a programmer to derail potential optimizations and
>|produce slow, bloated code. [..]"
>C++ is worse in this regard than Ada 2007. Ada 2007 is worse in this
>regard than Ada 95 in one respect and better in another. Ada 95 is
>worse in this regard than Ada 83.

As I said, YMMV.

If you code your C++ carefully, deliberately avoid array aliasing and
stick to indexing rather than explicit pointer manipulation, then Ada
has no real advantage over C++ in terms of optimization potential.

Similarly, if you write OO code in Ada, that will present most of the
same optimization problems as OO code in C++.

>|"By far, the bulk of research on performance optimization has been done
>|for Fortran. You'll need to be able to read Fortran and transliterate
>|Fortran examples into your preferred implementation language(s)."
>Look out for my reaction on what someone who used to lecture me said
>about FORTRAN in a paper which I am still drafting. (I might need to
>omit it though: the current draft's length is much longer than 200
>pages and the editor might be scared of a libel suit, though it is not

Sorry if you loathe Fortran, but them's the breaks. Fortran, by far,
still is the preferred language for high performance number crunching.
More research has been done on optimizing and parallelizing Fortran
than on all other languages combined.

>|">Alfred V. Aho & Monica S. Lam & Ravi Sethi & Jeffrey D. Ullman,
>|>"Compilers: Principles, Techniques, & Tools",
>|>second edition, the first printing of the second edition is not acceptable
>|Another good intro book."
>Have the mistakes in the first printing been corrected? If so, how can
>I determine whether a stockist is selling a newer printing?

There are still a few mistakes in the 2008 printing, but AFAIK most of
the many egregious errors in the 2006 printing have been corrected.

>|" The 486 is still used in embedded systems, but it
>|has a different internal architecture and optimization for 486 is
>|quite different than for 386. The 486 and original Pentium have much
>|more in common internally than do the 386 and 486 (selecting
>|Pentium/P5 as the target is a cheap way to make some 486 programs run
>Not that it really matters, but I thought that integer operations were
>still quicker on 486s, just like on 386s.

Not sure exactly what you mean here.

The 486 executed "simple" register->register or immediate->register
ops twice as fast as the 386. The 486 had onboard data and code cache
- 386 caching was all external - and its instruction prefetch buffer
was twice as large. The 486 introduced store buffering so it didn't
have to wait for memory writes to complete. And, of course, it had
the FPU onboard so floating point code ran about 3 times as fast as on
a 386/387.


If you're referring to 486 vs Pentium, the Pentium had dual integer
ALUs and a *much* faster FPU. For "simple" register ops it could
burst execution at twice the speed of a similarly clocked 486.
However, the Pentium's integer pipelines were asymmetric - only one
ALU was able to perform address calculations and register->memory ops
- so, on average, the Pentium was not twice as fast as the 486 but
rather more like 130-140%. The Pentium had separate internal code and
data caches, a wider memory bus and enhanced store buffering with
write-cancelling and write-combining ... all of which made memory I/O
more efficient than the 486.


>what was the first book in the [Gems] series which was not very specific
>to 386s and/or 486s?

Sorry, I can't remember ... it's been way too long. I had access to a
number of them through various jobs but I only have a couple on my own

>|">Thomas Pittman,
>|>"Practical Code Optimization by Transformational Attribute Grammars Applied
>|to Low-Level Intermediate Code Trees", Interesting but irrelevant."
>Why would you say that?

Despite the name it is not about optimizing code ... it is about
implementing a compiler using a particular style which emphasizes the
use of formal grammars to describe IR transformations.

It's a good read if you intend to write a compiler, but it won't help
you understand code optimization.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.