Re: variable length instructions, was Chicken or the Egg?

debray@CS.Arizona.EDU (Saumya K. Debray)
12 Oct 2000 22:20:54 -0400

          From comp.compilers

Related articles
Re: Chicken or the Egg? michael@moria.de (2000-10-10)
Re: variable length instructions, was Chicken or the Egg? debray@CS.Arizona.EDU (2000-10-12)
Re: variable length instructions, was Chicken or the Egg? michael@moria.de (2000-10-12)
Re: variable length instructions, was Chicken or the Egg? vbdis@aol.com (2000-10-15)
Re: variable length instructions, was Chicken or the Egg? cfc@world.std.com (Chris F Clark) (2000-10-18)
Re: variable length instructions, was Chicken or the Egg? dietrich@216.26.55.26 (Dietrich Epp) (2000-10-19)
Re: variable length instructions, was Chicken or the Egg? michael@moria.de (2000-10-19)
Re: variable length instructions, was Chicken or the Egg? david.thompson1@worldnet.att.net (David Thompson) (2000-10-19)
[1 later articles]
| List of all articles for this month |
From: debray@CS.Arizona.EDU (Saumya K. Debray)
Newsgroups: comp.compilers
Date: 12 Oct 2000 22:20:54 -0400
Organization: University of Arizona CS Department, Tucson AZ
References: 00-10-073
Keywords: linker

<michael@moria.de> wrote:
>Instead of a two pass linker, I needed an n-pass linker. It starts out
>by assuming that all instructions could be coded with the minimal size, so
>after the first pass, you have an optimistic guess. The following passes
>enlarge instruction sizes, if needed. Repeat until no instruction needs
>to be enlarged.
...
>[Generating instructions with variable length addresses is a well understood
>problem. The PDP-11 Unix assembler did it in about 1973, and there was
>a paper by Tom Szymanski in the CACM about that time.


An interesting twist on this problem arises on modern processors
where the alignment of a loop (i.e., the low-order bits of the address of
the first instruction of the loop) can affect its performance.
As an example, on Pentium and P6 processors, the effectiveness of
the branch prefetch unit is maximized when the loop is 16-byte aligned.
This alignment requirement interacts with the way addresses are
encoded in variable-length instructions. E.g.:


          L0:
...
  /*I0*/ jmp L1
                ...
                ... <---------------- need alignment NOPs here
                LOOP CODE
                ...
  /*I1*/ jmp L0
...
          L1:


The "obvious" extension of the standard algorithm for the
span-dependent-encoding problem can get into an infinite loop
in some of these situations. E.g.:


    (1) we choose the smallest offset for the two jmp instructions;
    (2) based on this we emit a certain no. of alignment NOPs
            for the loop;
    (3) we then discover that, because of the alignment NOPs,
            the label L1 is now too far away from instruction I0
            and that I0 therefore needs a larger address encoding.
            Similarly, label L0 is now too far away from instruction
            I1, so I1 needs a larger instruction encoding.
    (4) because the size of the address encoding in instruction I0
            has increased, we have to reduce the no. of alignment NOPs
            for the loop, to maintain its alignment;
    (5) reducing the no. of alignment NOPs brings the two
            jump instructions closer together, making it possible for
            them to use smaller address encodings;


            ... and so on.
--
Saumya Debray
Dept. of Computer Science, University of Arizona, Tucson
debray@cs.arizona.edu
http://www.cs.arizona.edu/people/debray
[Whaddaya mean "modern" processors? The 1960s vintage 360/91 also needed
loops aligned on 16 byte boundaries to run at full speed, but back then we
hand coded them in assembler. -John]





Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.