Re: instruction bundling (scheduling?)

anton@mips.complang.tuwien.ac.at (Anton Ertl)
Sun, 06 Apr 2008 16:20:41 GMT

          From comp.compilers

Related articles
instruction bundling (scheduling?) kevin.phillips83@yahoo.com (kphillips) (2008-04-04)
Re: instruction bundling (scheduling?) anton@mips.complang.tuwien.ac.at (2008-04-06)
Re: instruction bundling (scheduling?) kevin.phillips83@yahoo.com (kphillips) (2008-04-09)
Re: instruction bundling (scheduling?) andreybokhanko@gmail.com (2008-04-13)
Re: instruction bundling (scheduling?) kamalpr@gmail.com (IndianTechie) (2008-04-14)
Re: instruction bundling (scheduling?) kevin.phillips83@yahoo.com (kphillips) (2008-04-15)
Re: instruction bundling (scheduling?) SidTouati@inria.fr (Sid Touati) (2008-04-22)
| List of all articles for this month |
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Date: Sun, 06 Apr 2008 16:20:41 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 08-04-017
Keywords: optimize
Posted-Date: 06 Apr 2008 14:09:57 EDT

kphillips <kevin.phillips83@yahoo.com> writes:
>I'm currently experimenting with compiler writing and have so far
>produced assembly code with basic-block register allocation (using
>liveness analysis). I have access to an Itanium machine so I'm now
>generating IA-64 assembly, however none of the instructions are being
>bundled yet (producing explicit stops at the moment).
>
>I have the following two queries:
>
>1. All list scheduling I've seen deal with latency. What I require is
>using constraints in choosing instructions for each bundle. Does this
>still fall under the 'instruction scheduling' umbrella


You may refer to two different problems:


1) Grouping: Where to put the stops (and minimize the stops to
minimize the execution time on a wide-issue machine).


2) Bundling: How to arrange instructions to minimize the number of
bundles needed, considering the encoding restrictions of the
architecture (and possibly minimize the execution time on a
narrow-issue machine).


The goals you have for these problems may sometimes be at odds with
each other: E.g., consider an instruction that could still be in the
same group, but a stop is needed afterwards; and if you put the stop
afterwards, that causes additional empty instruction slots (resulting
in more bundles) because of IA-64s encoding restrictions, whereas if
you put the stop before, you may need another stop sooner than
otherwise.


In any case, I would deal with these problems in list scheduling, but
of course you would need additional code to represent the bundling
rules: Among the currently ready instructions, you choose one that
fits into the current bundle. You can do a bit better by looking at
the ready instructions all at once, and matching them with the
available bunding encodings, and selecting a set that maximizes bundle
utilization. You insert a stop before you schedule an instruction
that has a register dependency (or whatever the grouping rules
specify) on an earlier instruction, if there is no stop after that
instruction yet. That approach puts stops as late as possible, so it
optimizes grouping.


It's possible to do better (after all, already list scheduling is
suboptimal in general), but that should be good enough for a start.


>2. Since register allocation uses liveness properties to re-use
>registers, it counter-acts the parallelism in instruction bundling.
>
>a. If I remove the liveness analysis, then spill code will have to be
>added, increasing the overall cycle length and reducing performance.
>
>b. if on the other hand, register allocation is done after scheduling,
>if spill code is still needed then (somehow?), the whole block must be
>re-scheduled.


Given the large number of registers in IA-64, for simplicity just
schedule first and allocate registers later; spilling and thus
rescheduling will be rare. Also, most liveness analysis and register
allocation algorithms work on scheduled code anyway (if they work
before instruction scheduling, they use the order coming out of
earlier phases).


>There are many academic literature on this subject, but none deals
>with this at this level of simplicity (basic block) level as far as
>I'm concerned.


A nice paper on instruction scheduling and register allocation for
basic blocks is:


@InProceedings{goodman&hsu88,
    author = "James R. Goodman and Wei-Chung Hsu",
    title = "Code Scheduling and Register Allocation in Large
Basic Blocks",
    booktitle = "International Conference on Supercomputing",
    year = "1988",
    pages = "442--452",
    annote = "After an overview of the phase ordering problems in
instruction scheduling and register allocation two
algorithms are introduced: Integrated Prepass
Scheduling keeps track of the number of registers
left and switches between scheduling to minimize
pipeline stalls and scheduling to minimize register
usage accordingly. A variation of this algorithm also
spills registers in certain circumstances. DAG-Driven
Register Allocation is to be used with a postpass
scheduler and tries to allocate the registers without
generating new dependencies. If this cannot be
achieved, the register is chosen in a way that
minimizes the path length of the additional paths.
The two algorithms (combined with a register
allocator and an instruction scheduler, respectively)
perform better than the usual prepass, postpass
or two-pass approaches."
}


- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.