Re: Assembling span-dependent instructions
Thu, 28 Jul 2022 12:15:14 -0000 (UTC)

          From comp.compilers

Related articles
Assembling span-dependent instructions (2022-07-27)
Re: Assembling span-dependent instructions (Kaz Kylheku) (2022-07-27)
Re: Assembling span-dependent instructions (2022-07-28)
Re: Assembling span-dependent instructions (2022-07-28)
Re: Assembling span-dependent instructions (gah4) (2022-07-29)
Re: Assembling span-dependent instructions (Kaz Kylheku) (2022-07-29)
| List of all articles for this month |

Newsgroups: comp.compilers,comp.arch
Date: Thu, 28 Jul 2022 12:15:14 -0000 (UTC)
Organization: NNTP Server
References: 22-07-049 22-07-052
Injection-Info:; posting-host=""; logging-data="49930"; mail-complaints-to=""
Keywords: optimize, assembler
Posted-Date: 29 Jul 2022 16:38:22 EDT

In comp.arch Kaz Kylheku <> wrote:
> On 2022-07-27, Anton Ertl <> wrote:
> > However, one can also construct cases where making the code larger can
> > reduce the minimum size of the immediate operand, e.g.:
> >
> > foo:
> > movl foo+133-bar(%rdi),%eax
> > bar:
> That's weird; what is accessed this way, relative to the code,
> and does it occur in compiler output?

Code like this may appear due to alignment, say jump to page or cache
line boundary. In realistic situation one is faced in much more
compilex problem. Namely on many architectures best way to provide
constant arguments is by storing constants in memory. This leads to
"constant pools" and problem where to place them. One wants constant
pools as close as possible to code, to use short offsets accessing
them. But for performance reasons it is desirable to put constants in
separate cache lines. Also, one needs jumps to jump around constant
pools. Some jumps occur naturally in program, it is good to re-use
them. But there are possible unused parts of cache lines (both for
code and constant pools). So there is need to balance loss due to
unused parts of cache lines (probably dominant factor), length of
instructions and possible overhead due to extra instructions.

There is extra complication when machine has limited range of offsets
which can be used in single intstruction: when needed offset exceeds
allowed range one has to change to indirect form which needs free
register. So there is extra interaction with registes allocation. This
is particularly nasty if one needs free register and wants to spill
some other register, but address constants exceed length allowed in
instruction, so in order to free register one already needs free
register to put there address constant.

i386 allows rather easy solution to such problems because one can
reach any locations from loads and jumps and most instructions accept
32-bit constant (immediate) arguments. On x86_64 (that is in 64 bit
mode) situation is more interesting because immediates and offsets are
limited to 32-bits, so one can no longer reach whole memory. But
current practice make it easy but wasetful: each part of program (main
executable and shared libraries) is limited to 2G so that 32-bit
offsets are enough and accesses to other parts are indirect. 32 bit
ARM has most of such problems: offsets are quite limited and mosts
constants need to be loaded from memory. ARM intructions are of fixed
length, but when offset is too big to fit into single instruction one
has to use alternative sequence of several instructions (and deal with
register allocation). Z architecture (modern versions of IBM 360) has
such problems too: there are variants of instruction having different
lengths but even longest variant have limited range of available
offsets. At least some versions of Z architecture had severe penalty
for simultaneusly accessing the same cache line for instruction fetch
and data access, so putting constant pools in separate cache line was
very important.

                                                            Waldek Hebisch

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.