Related articles |
---|
[4 earlier articles] |
Re: x86-64 and calling conventions daveparker@flamingthunder.com (Dave Parker) (2008-05-12) |
Re: x86-64 and calling conventions cr88192@hotmail.com (cr88192) (2008-05-13) |
Re: x86-64 and calling conventions cr88192@hotmail.com (cr88192) (2008-05-13) |
Re: x86-64 and calling conventions gah@ugcs.caltech.edu (glen herrmannsfeldt) (2008-05-13) |
Re: x86-64 and calling conventions james.harris.1@googlemail.com (James Harris) (2008-05-14) |
Re: x86-64 and calling conventions vidar.hokstad@gmail.com (Vidar Hokstad) (2008-05-14) |
Re: x86-64 and calling conventions james.harris.1@googlemail.com (James Harris) (2008-05-14) |
Re: x86-64 and calling conventions cr88192@hotmail.com (cr88192) (2008-05-15) |
Re: x86-64 and calling conventions cr88192@hotmail.com (cr88192) (2008-05-15) |
Re: x86-64 and calling conventions bc@freeuk.com (Bart) (2008-05-14) |
Re: x86-64 and calling conventions cr88192@hotmail.com (cr88192) (2008-05-15) |
Re: x86-64 and calling conventions bolek-compilers@curl.com (Boleslaw Ciesielski) (2008-05-23) |
Re: x86-64 and calling conventions gah@ugcs.caltech.edu (glen herrmannsfeldt) (2008-05-29) |
From: | James Harris <james.harris.1@googlemail.com> |
Newsgroups: | comp.compilers |
Date: | Wed, 14 May 2008 11:37:05 -0700 (PDT) |
Organization: | Compilers Central |
References: | 08-05-031 08-05-038 08-05-051 |
Keywords: | code, performance |
Posted-Date: | 14 May 2008 15:41:16 EDT |
On 14 May, 11:38, James Harris <james.harri...@googlemail.com> wrote:
...
> By the way, the idea of duplicating code is intended to be quite
> widespread in order to achieve speed.
> [Are you considering cache behavior? If you duplicate a lot of code,
> it becomes less likely that it'll fit in the cache. -John]
Well, I'm aware of the issue but I don't have a general formula as yet
to know when to generate alternate copies of code and when not to do
so. Things I can say:
1. This is a code generation issue. The viability of using alternate
copies depends in large part on the target CPU. As such the IR is to
have only the simple loop code.
2. As long as the same alternative of the function code is called
repeatedly in an inner loop the benefits of code cacheing should still
apply. The natural extension of calling the same variant each time is
inlined code which will be appropriate in some cases.
Is it worth it? Although I cannot quantify the gains yet it is easy to
demonstrate cases where loops with few iterations are faster encoded
with non-loop instruction sequences. This applies even including a
test and branch (which is not needed if the iteration count is
constant or predictable). If the test and branch is needed viability
depends on correct branch prediction. In terms of ease of use I want
the programmer who uses the language to be able to code simple loops
without having to think about special per-CPU cases to gain speed, and
to be able to leave efficient code generation to the compiler.
The main intended benefit is the source code can be written
independently of word length and will thus scale to arbitrary-length
data (i.e. it can be more general) without sacrificing performance in
the smaller cases.
Don't code generators do this kind of thing as a matter of course?
--
James
[Yes, loop unrolling and software pipelining are well known
optimizations. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.