|Re: Chicken or the Egg? firstname.lastname@example.org (2000-10-10)|
|Re: variable length instructions, was Chicken or the Egg? debray@CS.Arizona.EDU (2000-10-12)|
|Re: variable length instructions, was Chicken or the Egg? email@example.com (2000-10-12)|
|Re: variable length instructions, was Chicken or the Egg? firstname.lastname@example.org (2000-10-15)|
|Re: variable length instructions, was Chicken or the Egg? email@example.com (Chris F Clark) (2000-10-18)|
|Re: variable length instructions, was Chicken or the Egg? firstname.lastname@example.org (Dietrich Epp) (2000-10-19)|
|Re: variable length instructions, was Chicken or the Egg? email@example.com (2000-10-19)|
|Re: variable length instructions, was Chicken or the Egg? firstname.lastname@example.org (David Thompson) (2000-10-19)|
|[1 later articles]|
|From:||debray@CS.Arizona.EDU (Saumya K. Debray)|
|Date:||12 Oct 2000 22:20:54 -0400|
|Organization:||University of Arizona CS Department, Tucson AZ|
>Instead of a two pass linker, I needed an n-pass linker. It starts out
>by assuming that all instructions could be coded with the minimal size, so
>after the first pass, you have an optimistic guess. The following passes
>enlarge instruction sizes, if needed. Repeat until no instruction needs
>to be enlarged.
>[Generating instructions with variable length addresses is a well understood
>problem. The PDP-11 Unix assembler did it in about 1973, and there was
>a paper by Tom Szymanski in the CACM about that time.
An interesting twist on this problem arises on modern processors
where the alignment of a loop (i.e., the low-order bits of the address of
the first instruction of the loop) can affect its performance.
As an example, on Pentium and P6 processors, the effectiveness of
the branch prefetch unit is maximized when the loop is 16-byte aligned.
This alignment requirement interacts with the way addresses are
encoded in variable-length instructions. E.g.:
/*I0*/ jmp L1
... <---------------- need alignment NOPs here
/*I1*/ jmp L0
The "obvious" extension of the standard algorithm for the
span-dependent-encoding problem can get into an infinite loop
in some of these situations. E.g.:
(1) we choose the smallest offset for the two jmp instructions;
(2) based on this we emit a certain no. of alignment NOPs
for the loop;
(3) we then discover that, because of the alignment NOPs,
the label L1 is now too far away from instruction I0
and that I0 therefore needs a larger address encoding.
Similarly, label L0 is now too far away from instruction
I1, so I1 needs a larger instruction encoding.
(4) because the size of the address encoding in instruction I0
has increased, we have to reduce the no. of alignment NOPs
for the loop, to maintain its alignment;
(5) reducing the no. of alignment NOPs brings the two
jump instructions closer together, making it possible for
them to use smaller address encodings;
... and so on.
Dept. of Computer Science, University of Arizona, Tucson
[Whaddaya mean "modern" processors? The 1960s vintage 360/91 also needed
loops aligned on 16 byte boundaries to run at full speed, but back then we
hand coded them in assembler. -John]
Return to the
Search the comp.compilers archives again.