Re: Modern compilers for ye olde architectures

anton@mips.complang.tuwien.ac.at (Anton Ertl)
Fri, 15 Oct 2021 07:37:30 GMT

          From comp.compilers

Related articles
[3 earlier articles]
Re: Modern compilers for ye olde architectures derek@NOSPAM-knosof.co.uk (Derek Jones) (2021-10-06)
Re: Modern compilers for ye olde architectures laguest@archeia.com (Luke A. Guest) (2021-10-06)
Re: Modern compilers for ye olde architectures anton@mips.complang.tuwien.ac.at (2021-10-06)
Re: Modern compilers for ye olde architectures theom+news@chiark.greenend.org.uk (Theo) (2021-10-06)
Re: Modern compilers for ye olde architectures laguest@archeia.com (Luke A. Guest) (2021-10-06)
Re: Modern compilers for ye olde architectures pkk@spth.de (Philipp Klaus Krause) (2021-10-06)
Re: Modern compilers for ye olde architectures anton@mips.complang.tuwien.ac.at (2021-10-15)
Re: Modern compilers for ye olde architectures pkk@spth.de (Philipp Klaus Krause) (2021-10-18)
Re: Modern compilers for ye olde architectures pkk@spth.de (Philipp Klaus Krause) (2021-10-18)
Re: Modern compilers for ye olde architectures pkk@spth.de (Philipp Klaus Krause) (2021-10-18)
Re: Modern compilers for ye olde architectures gah4@u.washington.edu (gah4) (2021-10-21)
Re: Modern compilers for ye olde architectures 480-992-1380@kylheku.com (Kaz Kylheku) (2021-10-22)
Re: Modern compilers for ye olde architectures dave_thompson_2@comcast.net (2021-11-14)
| List of all articles for this month |

From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Date: Fri, 15 Oct 2021 07:37:30 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 21-10-007 21-10-012 21-10-015
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="63942"; mail-complaints-to="abuse@iecc.com"
Keywords: architecture, code
Posted-Date: 16 Oct 2021 13:45:08 EDT

Philipp Klaus Krause <pkk@spth.de> writes:
>See "Optimal Register Allocation in Polynomial Time". A graph-coloring
>approach that can handle irregularities well (as long as there are not
>too many registers). SDCC uses such a register allocator for some
>backends, including z80.


Interesting paper. I had trouble following the theoretical part, but
looking at the practical part and taking the part that I understood
about the theory, you try out all possible assignments of all
registers to the maximum clique (live ranges that are alive at the
same time), and because the number of registers is a constant (for a
given architecture), this is a polynomial.


You discuss splitting live ranges at control-flow-graph boundaries,
but it seems to me that in some cases you want to split live ranges
between instructions within a control-flow node; this does not change
the complexity-theoretic result, but of course the implementation.


If only the maximum clique size plays a role, it's as if the
assignments to the cliques are independent; but then you have to
reconcile the assignments for different cliques by introducing reg-reg
move instructions, and take these costs into account, and optimize
them, too; so I don't see that the cliques can be treated as
independent. And you probably don't, because there are additional
factors in the complexity. In any case, I am missing something, and
maybe you have an idea what it is.


I am wondering about one thing in the empirical results in your paper:
Why is the code size not monotonously falling with increased numbers
of assignments? Are these independent runs with different
(pseudo-random) assignments?


I had some questions which were mostly answered by the paper, but
maybe you can offer additional insights:


* Am I right that earlier register allocators were bad for irregular
    register sets, and that's why general-purpose registers won once
    compilers became dominant? Why did general-purpose registers become
    dominant?


The advantage of general-purpose registers is given in that paper as
reducing the complexity of the algorithm by a factor of:


(2*(tree-width(G)+1)*#registers)!


E.g., for tree-width 2 and 9 registers, it's 54! = 2.30843697*10^71


Which poses the question: In your empirical work you stopped at 10^8
assignments (in some cases, less). How did you get provably optimal
assignments on the Z80 with its 9 registers?


* What are the key points why your work can deal with irregular
    register sets, and earlier approaches are pretty bad at that?


It seems to me that you use the CPU power available now to try out
many different assignments, while earlier work has balked at that.


* Do you have any idea why no good approach for dealing with irregular
    instruction sets has been found in, say, the 1970s and 1980s when
    irregular register sets were more common (e.g. on the Z80 and the
    8086).


Your approach is an (ideally exhaustive) search that uses more CPU
power (and memory?) than was available then. At the time, one would
have resorted to heuristics, but apparently no general effective
heuristics have been found.


- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.