Re: Graph colouring and local register allocation

Sid Touati <SidTouati@inria.fr>
Mon, 10 Dec 2007 11:01:33 +0100

From comp.compilers

Related articles
[11 earlier articles]
Re: Graph colouring and local register allocation pertti.kellomaki@tut.fi (=?ISO-8859-1?Q?Pertti_Kellom=E4ki?=) (2007-12-03)
Re: Graph colouring and local register allocation SidTouati@inria.fr (Sid Touati) (2007-12-04)
Re: Graph colouring and local register allocation torbenm@app-1.diku.dk (2007-12-05)
Re: Graph colouring and local register allocation SidTouati@inria.fr (Sid Touati) (2007-12-08)
Re: Graph colouring and local register allocation anton@mips.complang.tuwien.ac.at (2007-12-09)
Re: Graph colouring and local register allocation gah@ugcs.caltech.edu (glen herrmannsfeldt) (2007-12-09)
*Re: Graph colouring and local register allocation SidTouati@inria.fr (Sid Touati)* (2007-12-10)**

| List of all articles for this month |

From:	Sid Touati <SidTouati@inria.fr>
Newsgroups:	comp.compilers
Date:	Mon, 10 Dec 2007 11:01:33 +0100
Organization:	I.N.R.I.A Rocquencourt
References:	07-10-103 07-11-019 07-11-063 07-11-091 07-12-003 07-12-005 07-12-007 07-12-010 07-12-013 07-12-019 07-12-020
Keywords:	registers, parallel
Posted-Date:	10 Dec 2007 10:42:13 EST

Anton Ertl a icrit :
mal schedule...
>
> I don't think that the theoretical worst-case is very interesting in
> most situations.

The problem is that the number of situations is infinite. So, saying
"most of the codes", "most of the situations" or "representative
codes" has no absolute signification. Everyone has its own definition
of "practice". In my opinion, a compiler should be designed for
current codes, and also for future codes the do not already
exist. Designing a compiler for existing codes is meaningless for me
(it's my own opinion).

Usually, architects adds microarchitectural hardware to accelerate
current codes, not future codes. Optimizing compiler should (try) to
optimize all codes, current and future ones. This makes optimizing
compilation more difficult in general than engineering a processor.

> What's more relevant is that OoO execution typically prioritizes the
> oldest instruction instead of using the better heuristics that
> software schedulers use (e.g., longest path).

If you can read our previous citations, we showed in very simple
"practical" existing BLAS1 loops, and even if data are in the L1/L2
cache, OoO schedulers can be:
- 21x far from the optimal in alpha 21264
- at least twice the optimal in power 4.
- twice the optimal on itanium2 (here, the processor is in-order)

We have taken very small simple loops, where data reside in caches, and
a large amount of ILP exist. I cannot imagine codes simpler than that.

Both the compiler (with a lot of compilation swicths) and the OoO
execution fail to reach peak performance

> BTW, list scheduling with the right priority function should be
> optimal, no? Just use the optimal schedule to determine the priority
> for list scheduling.

Of course not, it is proved that any (polynomial time) list scheduler
(for any priority) can be twice far from the optimal. That is, for any
variant of list scheduler, you can be twice the optimal. This is a
mathematical fact, not an opinion.
This is because any variant of list scheduler do not delay a ready
instruction.

>
> Not for most code. That's one of the reasons why OoO execution has
> succeeded: software scheduling does not work well across basic block
> boundaries in general, so hardware scheduling proved useful despite
> its not-so-great heuristics.

See my previous opinion about "most of the codes".

>
> That demonstrates my "not for most code" point. Sure, for every
> engineering solution one can find pathological cases where it doesn't
> work well, and sometimes such cases even occur in practice (IIRC FFTW
> has some very large basic blocks), but still OoO execution, and the
> usual way of dealing with register allocation and instruction
> scheduling work not too badly for most code, and when I see problems,
> they are quite different from the ones you have discussed (but then I
> don't deal with "scientific" code).

Yes, that comes to opinions about what is "too bad" and what is
"satisfactory". Depending on the objectives you target, you may be
satisfied or not by a solution.

When I compile an application on my desktop station, I am always
satisfied by the compiler. Buy when I compile a small embedded code, I
am not satisfied, since I always find a great space for improvement at
backend level.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Graph colouring and local register allocation

Sid Touati <SidTouati@inria.fr>Mon, 10 Dec 2007 11:01:33 +0100

Sid Touati <SidTouati@inria.fr>
Mon, 10 Dec 2007 11:01:33 +0100