|Block Scheduling: determ vs nondeterm PAULLONG@delphi.com (1994-06-05)|
|Re: Block Scheduling: determ vs nondeterm firstname.lastname@example.org (1994-06-13)|
|Re: Block Scheduling: determ vs nondeterm email@example.com (1994-06-15)|
|Re: Block Scheduling: determ vs nondeterm firstname.lastname@example.org (1994-06-16)|
|Re: Block Scheduling: determ vs nondeterm chase@Think.COM (1994-06-16)|
|Re: Block Scheduling: determ vs nondeterm email@example.com (1994-06-20)|
|From:||firstname.lastname@example.org (Dirk Grunwald)|
|Organization:||University of Colorado at Boulder|
|Date:||Mon, 20 Jun 1994 01:58:27 GMT|
A similar greedy algorithm is used by Wen-mei Hwu and Pohua P. Chang in..
AUTHOR = "Wen-mei W. Hwu and Pohua P. Chang",
TITLE = "Achieving High Instruction Cache Performance with an
BOOKTITLE = ISCA89,
YEAR = 1989,
PAGES = "242--251",
ORGANIZATION = ACM,
PUBLISHER = ACM
For each procedure, they take the entry block, then ``grow it down,''
following the most common execution path. They then select the next block,
and ``grow it down'' and ``grow it up''. In later paper, some of which
should be in SP&E by now, they show that by ``tail duplication'' of what
are essentially traces, but they call ``superblocks'', they can greatly
reduce Icache miss rates and improve branch prediction.
Pettis and Hansen used a similar technique..
AUTHOR = "Karl Pettis and Robert C. Hansen",
TITLE = "Profile Guided Code Positioning",
BOOKTITLE = pldi90,
YEAR = 1990,
PAGES = "16--27",
ORGANIZATION = "ACM",
PUBLISHER = "ACM",
MONTH = Jun
Pettis and Hansen didn't duplicate code, but they did ``segregate'' the
frequently executed traces from the infrequently executed traces. This
improved the Icache miss rate and probably the TLB miss and paging rate.
My student, Brad Calder, and I compared several such techniques and
devised one that performs an exhaustive search over the set of most
commonly executed blocks. This has (marginally) better behavior than the
previous two techniques. We were mainly interested in branch prediction,
and compared the performance for several static prediction mechanisms and
some dynamic mechanisms.
You want to balance off having the ``fall through'' path be the most
common with the desire to eliminate branches. I have read other papers
(which I've forgotten the citations for) that attempt to reduce unecessary
jumps by duplicating code.
The implications of software pipelining are interesting. We applied these
transformations after the program was compiled, so didn't have to worry
The paper I mentioned will be in the next ASPLOS, and we should have a
ready version in a couple of weeks if you want it sooner.
Return to the
Search the comp.compilers archives again.