Re: Books and other things which cost money to improve performance

Paul Colin Gloster <>
Fri, 9 Jul 2010 10:05:30 -0400 (EDT)

          From comp.compilers

Related articles
Books and other things which cost money to improve performance (Colin Paul Gloster) (2010-07-05)
Re: Books and other things which cost money to improve performance (Hans-Peter Diettrich) (2010-07-06)
Re: Books and other things which cost money to improve performance (George Neuner) (2010-07-06)
Re: Books and other things which cost money to improve performance (Paul Colin Gloster) (2010-07-09)
Re: Books and other things which cost money to improve performance (Matthias-Christian Ott) (2010-07-10)
Re: Books and other things which cost money to improve performance (George Neuner) (2010-07-11)
Re: Books and other things which cost money to improve performance (Paul Colin Gloster) (2010-08-31)
| List of all articles for this month |

From: Paul Colin Gloster <>
Newsgroups: comp.compilers
Date: Fri, 9 Jul 2010 10:05:30 -0400 (EDT)
Organization: Compilers Central
References: 10-07-009
Keywords: optimize, books
Posted-Date: 09 Jul 2010 21:35:02 EDT

On Tue, 6 Jul 2010, George Neuner sent an impressively large and
excellent response:
|"[..] |
| |
|The biggest problems for compilers are indirect data access |
|(pointers), object aliasing (pointers again), indirect function calls |
|(yet more pointers)," |



|" OO" |


|" and flexible data structures (did I mention pointers?)" |

You might have. Let me read through that post again to check ;)

|"[..] |
| |
|The biggest problem for developers on modern OSes is the failure to |
|recognize the impact of virtual memory on large data sets." |

This is not too much of a problem for us at the moment (average memory
consumption has tended to be just over one gigabyte per core: with
each core running a separate, unrelated simulation). Last week I was
seeing memory consumption slightly exceeding all the physical RAM, but
as I have only four gigabytes I plan to solve this by getting more
RAM. We do use data, but the code is to a large extent based on models
instead of data. We will be adding data which we do not currently
have, but this might not really increase the memory requirements. Of
course, it depends on how much we add.

|"Tricky coding and trying to outguess the compiler can easily result in |
|_worse_ performance." |

Yes. So far, I have only slowed the code down slightly in this
way (and I notice because I check and hence undo such changes),
whereas adding additional supposed optimization switches (as in
        -O3 ftree-switch-conversion -ffast-math -fno-keep-static-consts
        --fmodulo-sched -fmodulo-sched-allow-regmoves -fgcse-sm -fgcse-las
        --fgcse-after-reload -fsched-spec-load fsched2-use-superblocks
        --fsched2-use-traces -fsee -fselective-scheduling
        --fsel-sched-pipelining -fsel-sched-pipelining-outer-loops
        --fno-conserve-stack ftree-loop-linear -floop-block
        --ftree-loop-distribution -ftree-loop-im -funswitch-loops
        --ftree-loop-ivcanon -funroll-loops -fprefetch-loop-arrays
        -fipa-type-escape -fipa-matrix-reorg
        --fvariable-expansion-in-unroller -freorder-blocks-and-partition
        --fweb -findirect-inlining -combine -ftree-coalesce-vars
        --frename-registers -freciprocal-math -fassociative-math
        -fno-branch-count-reg -fno-zero-initialized-in-bss
        --fno-ira-share-save-slots -fmerge-all-constants
        --fno-ira-share-spill-slots -fkeep-inline-functions -fira-coalesce
        --freschedule-modulo-scheduled-loops fno-guess-branch-probability
for G.C.C. instead of
-O3 -findirect-inlining -ftree-switch-conversion
) has slowed it down significantly. Of course, maybe I should be more
selective about which subprograms are compiled with particular
switches: one of many things to try sometime.

|" Better algorithms, clean simple code, making |
|sure data is properly aligned"

Data alignment is an issue which I had been concerned about but I have
not yet had time to check whether or not it is causing a problem.

|" and reorganizing data to better match |
|VMM paging behavior (tiling, etc.) buys you much more than playing at |
|being a compiler." |

If necessary I might contribute to a compiler to improve its output
for this code.

|"It is true that the compiler must be conservative ... it is not |
|supposed to alter the program's semantics (modulo compile "modes" nd |
|conditional compilation) ... but it is also true that good compilers |
|are smarter and more observant than all but the very best developers. |
|If a good compiler passes up an opportunity to apply some meaningful |
|optimization it is *usually* because it has seen a problem the |
|developer has overlooked." |

Certainly not in this codebase, I can guarantee you that, but the
results are not yet published.

I am not boasting here though: of course the compilers are much better
at producing fast executables than I am, even for this codebase. What
I am pointing out is that compilers do overlook significant things
which I notice and replace with versions which the compilers
easily produce faster code for.

|">Various tricks in tiger books; the red dragon book; and "Computer |
|>Architecture: A Quantitative Approach" (or superior variants of those |
|>tricks as appeared in the books "More Tricks of the Game Programming |
|>Gurus" and the excellent "Black Art of Macintosh Game Programming" by |
|>Kevin Tieskoetter) have helped me to speed up an as-yet unreleased |
|>version of the Monte Carlo particle simulator Cosima ( |
|>WWW.MPE.MPG.De/MEGA/megalib.html ). |
| |
|That's great." |


|" But keep in mind that *many* of those "tricks" are |
|special purpose and/or architecture specific" |

Yes, but if even one is good then it is worthwhile.

By the way, one of my favorite tricks from "Black Art of Macintosh
Game Programming" (a book only for Motorola (before it became
Freescale) 68xyz and PowerPC processors) is generally useful.

|" ... moving up or down one |
|generation in CPU or changing the number of cores may invalidate |
|everything and result in a net loss of performance." |

This was pointed out in those books. However, those books went out of
print in the 1990's and they still sped up this codebase in 2010 for a
processor which did not even exist at the time.

As for changing the number of cores: earlier this week I did try
switching from a single core to four cores. (I did this merely by
adding G.C.C.'s -ftree-parallelize-loops=4 switch because I am not
ready to tackle the code for it (and I might never bother because I
have many things to do which would have definitely positive results)
and because the same urgent budgetary deadline which rushed the list
of books at the start of this thread was also rushing me to determine
whether we should buy FPGAs from XtremeData.) It was much slower and
used basically just one core. The pseudoFORTRAN version supposedly did
run on a GNU/Linux cluster, so if true, then it may be possible to
exploit more than one Intel core or an FPGA or a GPU, but maybe not
now. So I doubt that I will be recommending to order big FPGAs next

(I think I might recommend that we still buy a cheap PCI or PCIe FPGA
next week, if the thousands of Euro on software; probably more than
a thousand Euro on books; and whatever I might recommend for
consultants (I am not sure whether consultants are covered by one of
the current budgets) do not exhaust these budgets.)

|"[..] |
| |
|[..] |
| - eliminating redundant or unnecessary calculations, |
|[..] |
| - inlining leaf functions, |
| etc." |

It had been claimed that these can actually slow down the code, but I
could probably say that about anything.

|"CPU specific optimizations apart from data alignment are much more |
|difficult to analyze for value and overall are worth less than a more |
|general code optimization in the face of multi-tasking. For example, |
|it is feasible to statically determine optimal data placements so as |
|to minimize direct mapped cache conflicts ... but there's little point |
|to doing this complicated analysis if the cache is expected to be |
|polluted by unrelated programs." |

One thing which has not been clear to me (one of the many things which
I would like to check some time) is whether each core of the Intel(R)
Core(TM)2 Quad CPU Q9450 which I use has its own L1 cache (probably,
actually I may have checked the L1 already, but if so I have
forgotten); and its own L2 cache? Similarly in the case of an L3 cache
of an Intel Xeon (not that I use one)?

For the time being, I tend to run one simulation per core and have
little else taking up much CPU slices. (Well, relatively speaking, for
a general multitasking desktop. Many other processes do tend to be
running, but they tend to take up less than 1%. In other words, I
close down heavyweight programs and take my hands off the keyboard and
mouse and read a book; go home and sleep; and come back some other day.)

|">Using C++ was really not a good idea. Expect to see a version with |
|>a lot more Ada on the Internet later this year. |
| |
|YMMV but I think you are naove to expect a whole lot of performance |
|improvement to come simply from switching to Ada." |

I am already seeing great improvements, but of course I want more.

|" Once you step |
|outside the restricted real-time subset, Ada and C++ provide equal |
|opportunity for a programmer to derail potential optimizations and |
|produce slow, bloated code. [..]" |

C++ is worse in this regard than Ada 2007. Ada 2007 is worse in this
regard than Ada 95 in one respect and better in another. Ada 95 is
worse in this regard than Ada 83.

|"[..] |
| |
|>We unfortunately target x86s, but we will consider targeting other |
|>types of architectures such as FPGAs; GPUs; and supercomputers. As we |
|>already have x86s and might adopt something much faster, I do not |
|>expect that we shall target RISC machines though they probably would |
|>had been better to start with than x86s. |
| |
|There is little internal difference between a modern RISC and a modern |
|x86 - the x86 has too few programmer visible register names and a |
|complex instruction set, but behind the instruction decoder it is an |
|out-of-order superscalar RISC with hundreds of rename registers." |

Thanks for the tip.

|"In actual fact, the densely packed complex instruction set of the x86 |
|helps reduce memory pressure vs. the fixed width instructions of a |
|traditional RISC" |

This is conceivable. It is called the RISC penalty.

|" (not that any modern RISCs are "traditional" - they |
|all sport variable length instructions now)." |

I did not know that.

Not that supposedly Reduced Instruction Set Computers have completely
reduced instruction sets. Can you think of one supposed RISC which
does not have an instruction for conditional moving?

|"There is an ongoing thread in comp.arch talking about this right now." |

Thank you for the tip.

|"The one place that x86 compilers routinely do stumble is in stack |
|handling. The x86 prefers certain data alignment but none is actually |
|*required* (even for SSE) ... and it is quite hard to guarantee that a |
|value on the stack will always be properly aligned for most efficient |
|handling - particularly when some performance obsessed developer has |
|switched off stack framing and function preambles and makes |
|(in)judicious use of alloca() (or local equivalent)." |

I hope that I will not be such a twit. Actually I have been inspired
by FORTRAN to think about replacing stack allocations with heap
allocations. It is one of those many ideas I do not yet have time to
try. One concern though is that I do not know what limits to set, so I
think that I might wait until after the first Ada/assembly/pseudoC++
release before I release such a version (in preparation for a barrage
of complaints from users that I set limits too low).

|"[..] |
| |
|>Feedback [on] the following books ... |
| |
|I haven't read many of these, but if you've studied compiler basics |
|already, then you can safely ignore any intro books (if there are |
|chapters on parsing, you can pretty much forget about it)." |

I know what you meant but I disagree. "The Art of Compiler Design" for
example had a lot of information missing from other introductory
books. Even for books which treat the same topics, it can be nicer to
refresh myself on the basics by reading an easier book to complement
reading a more advanced book. I have found this to be true (for me)
for a number of fields: not just compilers.

|"For optimization purposes you need to concentrate on various forms of |
|dependence analysis. It's important to know what the compiler looks |
|for - and what it does *not* want to find - when it tries to optimize |
|code." |

Thanks for the tip.

|"By far, the bulk of research on performance optimization has been done |
|for Fortran. You'll need to be able to read Fortran and transliterate |
|Fortran examples into your preferred implementation language(s)." |

Look out for my reaction on what someone who used to lecture me said
about FORTRAN in a paper which I am still drafting. (I might need to
omit it though: the current draft's length is much longer than 200
pages and the editor might be scared of a libel suit, though it is not

|">John M. Levesque & Joel W. Williamson, |
|>"A Guidebook to Fortran on Supercomputers", |
|Don't need this unless you're actually writing Fortran." |

After a lot of effort I finally found a stockist for this immediately
before I saw this post. Thanks for saving us from ordering it.

|"[..] |
| |
|>Monica S. Lam, |
|>"A Systolic Array Optimizing Compiler" |
|Interesting but irrelevant unless you actually have a systolic array. |
|SAs are essentially a fixed pipeline of (relatively) simple processing |
|steps that together perform some complex computation on streaming |
|data. They are almost always dedicated, single purpose machines and |
|are not very common." |

I will give this a miss then.

|">Alfred V. Aho & Monica S. Lam & Ravi Sethi & Jeffrey D. Ullman, |
|>"Compilers: Principles, Techniques, & Tools", |
|>second edition, the first printing of the second edition is not acceptable |
|Another good intro book." |

Have the mistakes in the first printing been corrected? If so, how can
I determine whether a stockist is selling a newer printing?

|"[..] |
| |
|>John R. Ellis, |
|>"Bulldog: A Compiler for VLIW Architectures", |
|Interesting, but focused on low level instruction selection which is |
|way below the level of optimizations you want to target." |

Well, not if I am willing to modify compilers' backends. However, as I
am targetting a current Intel machine I shall probably not bother with
this book. I am sorry to have wasted your time with this entry.

|"[..] |
| |
|>Pappas & Murray, |
|>"80386 Microprocessor Handbook", |
|Way out of date." |

Okay, I will give it a miss.

|" The 486 is still used in embedded systems, but it |
|has a different internal architecture and optimization for 486 is |
|quite different than for 386. The 486 and original Pentium have much |
|more in common internally than do the 386 and 486 (selecting |
|Pentium/P5 as the target is a cheap way to make some 486 programs run |
|faster)." |

Not that it really matters, but I thought that integer operations were
still quicker on 486s, just like on 386s.

|"[..] |
| |
|>"Games Programming Gems 4", maybe other books in the series |
|The "Programming Gems" books, in general, are worth reading ... but |
|keep in mind that many of their tricks are architecture and |
|program/purpose specific rather than general principles to follow." |

As if I have not already taken up too much of your time, what was the
first book in ther series which was not very specific to 386s and/or 486s?

|">Thomas Pittman, |
|>"Practical Code Optimization by Transformational Attribute Grammars Applied |
|to Low-Level Intermediate Code Trees", |
|Interesting but irrelevant." |

Why would you say that?

|"[..] |
| |
|Many of the others you listed (did you cut-n-paste a biblio search?) |
|are too old, too basic, too general, or just irrelevant to your stated |
|purpose." |

This thread was actually started in April 2010 when I sent an email

I might post a list of books under consideration later, [..]

which was not sent by the moderator onto the newsgroup as he responded
"I don't see how anyone can help you unless you give us some hint of
what sort of performace you want to improve. [..]

I was working on revising the post, but I did not get it into a state
I was happy with (I wanted to include profiling information) when I
was told earlier this week that I should have ordered things to buy
last month. (Though it was hinted to me yesterday that I could order
next week.)

So I did not have time to make all the book entries uniform. Most of
the book titles were found by late June by doing things like
grep -i book * | grep --invert-match -i bookkeep | grep --invert-match
'new compiler tools, and book reviews.' | grep --invert-match 'books,
in printed journals, in on-line and off-line archives, CD-ROMs, and' |
grep --invert-match 'd be intereted in editing books for later years,
let me' | grep --invert-match ' book directly from me; send
email'|grep --invert-match 'sample chapter of the book
http://www.CS.Princeton.EDU/software/lcc/'|grep --invert-match 'Or
read any of the many books on the topic. Here are a few of
them.'|grep --invert-match 'Also see message 93-01-155 which reviews
many compiler textbooks.'| grep --invert-match 'as a book, is on the
web at'| grep --invert-match
'Addison Wesley, 1986, ISBN 0-201-10088-6, the "dragon book".' | grep
--invert-match 'A large book containing the complete source code to a
reimplementation of'|grep --invert-match 'subset-of-C compiler through
the book. (Recommended by Eric Hughes'|grep --invert-match 'Ansi C,
Lex and Yacc," McGraw Hill Book Co, 1990, ISBN 0-07-707215-4.'|grep
--invert-match 'introductory text before getting into the '|grep
--invert-match 'read - an excellent example of what textbooks should
all be like.'|grep --invert-match 'James E. Hendrix, "The Small-C
Compiler", 2nd ed., M&T Books, ISBN'|grep --invert-match
'0-934375-88-7 <Book Alone>, 1-55851-007-9 <MS-DOS Disk>,
0-934375-97-6' | grep --invert-match '<Book and Disk>.' | grep
--invert-match 'book, producing a product, etc., I would appreciate an
acknowledgement of'|grep --invert-match 'book has many typographical
errors, and readers should be suspicious of the'|grep --invert-match
'thorough....and explains every single aspect of the compiler. The
book' | grep --invert-match 's an extremely interesting book, check it
out. (Out of print,'|grep --invert-match 'Andrew Tucker
<> writes: This 512-page book presents'|grep
--invert-match 'assembly. All source code is provided in print and on
disk. This book is'|grep --invert-match 'full text of the book and a
lot of other stuff.)'|grep --invert-match 'comments that the book is a
piece of junk. The code that is contained in'|grep --invert-match
'the book is full of bugs, and the code that it generates will not
work.'|grep --invert-match 'Joe Snyder <> writes:
This unfortunately-titled book'| grep --invert-match 'The most
interesting things about this book are: 1) its ability to'|grep
--invert-match 's-how-to-code-it book. The author presents the'|grep
--invert-match 'This book makes the design and implementation of a
compiler look easy. I'|grep --invert-match 'as the example language
for his book, plus a little pseudo code where'|grep --invert-match
'Gabriela O. de Vivo <> writes: The book covers
the'|grep --invert-match 's easy to port the book' | grep
--invert-match 'compiler books that focus more on a batch approach and
optimised'|grep --invert-match 'combine a graduate or advanced level
course textbook with a'|grep --invert-match 'left unaddressed. The
preface declares the book'|grep --invert-match
'<> writes: This book "attempts a snapshot of
the'|grep --invert-match 'aspects of program translation and
compilation, this book would make a good'|grep --invert-match 'Joe
Snyder <> writes: This book is only really
useful'|grep --invert-match 'in the book, a good C compiler, and a
determination to absorb the'|grep --invert-match 'overview of compiler
architecture at the beginning, this book is for'|grep --invert-match
'hand holding going on with a book like this.'|grep --invert-match
--invert-match 'Jackson <> writes: This book
shines with the'|grep --invert-match 'Smalltalk Sourcecode and
Handbook (English): See WEB'|grep --invert-match 'Java Sourcecode and
Handbook (German): See WEB'|grep --invert-match 'd be interested in
editing books for later years, let me' | grep --invert-match 'as a
book, is on the web as' | grep
--invert-match 'that the book has many typographical errors, and
readers should be'|grep --invert-match 'J.H. Jacobs, published as a
book in 1990, is now on the web as'|grep --invert-match 'Wulf et
al. book, The Design of an Optimizing Compiler, American'| grep
--invert-match 'useful in writing a book, producing a product, etc., I
would appreciate an' | grep --invert-match 'By reading any of the many
books on the topic. Here are a few of them:' | grep --invert-match
'0-13-155045-4. A large book containing the complete source code to
a' | grep --invert-match 'Good for beginners. Develops a small
subset-of-C compiler through the book.'|less
on the entire archive of this newsgroup.

|"Unless you have a very specific task that needs a special purpose |
|architectural tweak to achieve your performance goal, you will want to |
|stay with more general knowledge of what compilers are looking for. |
|Follow the path of dependence analysis and you won't go wrong. |

|Hope this helps. |
|George" |

It certainly has. Thanks.

Do not forget that I am also soliciting consulting services (but I am
unsure whether the current budgets allow for consultancy services: but
if you quote me prices I can ask).

Yours sincerely,
Paul Colin Gloster

P.S. It seems to be easier to get a copy of the first edition of Bob
Morgan's "Building an Optimizing Compiler", but I suppose that the
newer edition is better. Any thoughts or tips?

P.P.S. Someone privately emailed me an offer for "Compiler Design
Theory". Firstly, we would need to be invoiced for any books we
buy. (The books would not necessarily need to be new, but an unused
hardback is preferable to a used book.) Secondly, I noticed that we
own some copies of this book and I have just glanced at it: it has a
very brief chapter on optimization, so even if that chapter is worth
reading, it is probably not worthwhile to buy another copy. People are
invited to debate this if they want: I am not adamant. This is an
impression from reading some sentences in the preface and no
sentences in the chapter.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.