Related articles |
---|
[9 earlier articles] |
Re: Graph colouring and local register allocation preston.briggs@gmail.com (preston.briggs@gmail.com) (2007-12-01) |
Re: Graph colouring and local register allocation gneuner2@/comcast.net (George Neuner) (2007-12-01) |
Re: Graph colouring and local register allocation pertti.kellomaki@tut.fi (=?ISO-8859-1?Q?Pertti_Kellom=E4ki?=) (2007-12-03) |
Re: Graph colouring and local register allocation SidTouati@inria.fr (Sid Touati) (2007-12-04) |
Re: Graph colouring and local register allocation torbenm@app-1.diku.dk (2007-12-05) |
Re: Graph colouring and local register allocation SidTouati@inria.fr (Sid Touati) (2007-12-08) |
Re: Graph colouring and local register allocation anton@mips.complang.tuwien.ac.at (2007-12-09) |
Re: Graph colouring and local register allocation gah@ugcs.caltech.edu (glen herrmannsfeldt) (2007-12-09) |
Re: Graph colouring and local register allocation SidTouati@inria.fr (Sid Touati) (2007-12-10) |
From: | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
Newsgroups: | comp.compilers |
Date: | Sun, 09 Dec 2007 18:16:35 GMT |
Organization: | Institut fuer Computersprachen, Technische Universitaet Wien |
References: | 07-10-103 07-11-019 07-11-063 07-11-091 07-12-003 07-12-005 07-12-007 07-12-010 07-12-013 07-12-019 |
Keywords: | parallel, registers |
Posted-Date: | 09 Dec 2007 15:49:25 EST |
Sid Touati <SidTouati@inria.fr> writes:
>1/ OoO execution is a heuristics (a basic list scheduler). And we know
>that any list scheduler can be twice the optimal schedule...
I don't think that the theoretical worst-case is very interesting in
most situations.
What's more relevant is that OoO execution typically prioritizes the
oldest instruction instead of using the better heuristics that
software schedulers use (e.g., longest path).
BTW, list scheduling with the right priority function should be
optimal, no? Just use the optimal schedule to determine the priority
for list scheduling.
>2/ Do not forget that register OOO have a limited instruction window.
>Nowadays, codes are large, and basic blocks are larger that instruction
>windows.
Not for most code. That's one of the reasons why OoO execution has
succeeded: software scheduling does not work well across basic block
boundaries in general, so hardware scheduling proved useful despite
its not-so-great heuristics.
Of course, there is some special numerical code that has been expanded
in very long basic blocks, and there software scheduling probably
outdoes hardware scheduling by a significant margin.
>3/ Register renaming is limited by the number of hidden hardware
>registers...
>4/ The commit stages of the pipelines are still in-order.
Yes, so hardware schedulers have a limited window; that does not add
anything to your point 2/.
>I suggest you an experience to check if OoO execution and register
>renaming are restricted in practice. For instance, take a set of loops
>with a reasonable amount of ILP. Choose loops with high data locality
>in order to avoid data cache problems. Or even better, choose compute
>bound loops (not memory bound). Then unroll these loops with a high
>unrolling degree in order to produce large loop bodies.
That demonstrates my "not for most code" point. Sure, for every
engineering solution one can find pathological cases where it doesn't
work well, and sometimes such cases even occur in practice (IIRC FFTW
has some very large basic blocks), but still OoO execution, and the
usual way of dealing with register allocation and instruction
scheduling work not too badly for most code, and when I see problems,
they are quite different from the ones you have discussed (but then I
don't deal with "scientific" code).
- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
Return to the
comp.compilers page.
Search the
comp.compilers archives again.