Re: Block Scheduling: determ vs nondeterm

chase@Think.COM (David Chase)
Thu, 16 Jun 1994 17:16:37 GMT

          From comp.compilers

Related articles
Block Scheduling: determ vs nondeterm PAULLONG@delphi.com (1994-06-05)
Re: Block Scheduling: determ vs nondeterm preston@noel.cs.rice.edu (1994-06-13)
Re: Block Scheduling: determ vs nondeterm paullong@delphi.com (1994-06-15)
Re: Block Scheduling: determ vs nondeterm preston@noel.cs.rice.edu (1994-06-16)
Re: Block Scheduling: determ vs nondeterm chase@Think.COM (1994-06-16)
Re: Block Scheduling: determ vs nondeterm grunwald@foobar.cs.colorado.edu (1994-06-20)
| List of all articles for this month |

Newsgroups: comp.compilers
From: chase@Think.COM (David Chase)
Keywords: optimize
Organization: Thinking Machines Corporation, Cambridge MA, USA
References: 94-06-060 94-06-089
Date: Thu, 16 Jun 1994 17:16:37 GMT

Another algorithm for rearranging blocks (the one I prefer) first
estimates (either through static program structure or profiling) edge
execution frequencies, blows the blocks apart, sorts the edges in order of
decreasing frequency, and then proceeds to pick glue-able pairs of blocks
from the edge list. (Basically, a greedy algorithm). I never got around
to figuring out whether this is optimal or not, or how hard optimal would
be if this isn't optimal, but it generally does a decent job of
rearranging code to squeeze non-loop code out of loop interiors, and a
decent job of rotating loops to reduce the number of branches into them.
You can use a modified union-find algorithm to keep track of which block
is where, and whether it is eligible for gluing or not. I think it should
be possible, to modify this to (also) replicate blocks of code if they
aren't too large, but I never got around to doing this either.


There are several things to watch out for when doing this.


1. loop rotation can interfere with software pipelining. A typical
      resulting loop looks like:


          goto test_block;


      body_block:
          ...
      test_block:
          if (iterate()) goto body_block;


      after_loop_block:


      This is "better" for loops because there is only one branch, not
      two (on machines were that matters), but for software pipelining
      you'd prefer to see:


              if (! iterate()) goto after_loop_block;


        body_block:
                ...
        test_block: /* Fused to its predecessor */
                if (iterate()) goto body_block;


        after_loop_block:


      This is a nice single-block loop -- easier to unroll/pipeline.


2. sometimes you want to keep (or even generate!) funny branchy idioms
      for later replacement by a peepholer, even if that conflicts with
      what edge frequencies might otherwise recommend.


3. It helps not to hash up the code order too much. Some people do care.
      It turns out that a good tie-breaker for edge frequency is


        (unsigned) (original_target_block_index - original_source_block_index)


      Choose the smallest value to break a tie.
      Thus, a branch from block N to N+1 scores (N+1) - N which is 1, the
      best you can do, since a block cannot be glued to itself. A backwards
      branch, say from N+1 to N, yields a score of 0xffffffff (unsigned,
      remember) which is a very large number.


      This one change makes a big difference (for the better) in the
      appearance of the resulting code.


David Chase, speaking for myself
Thinking Machines Corp.


--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.