Re: Interpreters and caller-saved registers

Thomas Koenig <>
Sun, 15 Oct 2023 19:52:45 -0000

          From comp.compilers

Related articles
Interpreters and caller-saved registers (2023-10-13)
Re: Interpreters and caller-saved registers (Thomas Koenig) (2023-10-15)
Re: Interpreters and caller-saved registers (2023-10-19)
Re: Interpreters and caller-saved registers (Thomas Koenig) (2023-10-22)
Re: Interpreters and caller-saved registers (2023-10-24)
Re: bug fixes, Interpreters and caller-saved registers (Kaz Kylheku) (2023-10-25)
| List of all articles for this month |

From: Thomas Koenig <>
Newsgroups: comp.compilers
Date: Sun, 15 Oct 2023 19:52:45 -0000
Organization: Compilers Central
References: 23-10-001
Injection-Info:; posting-host=""; logging-data="48403"; mail-complaints-to=""
Keywords: interpreter, optimize
Posted-Date: 15 Oct 2023 15:54:53 EDT

[Replying to comp.compilers as this is more pertinent there] <> schrieb:

> asm("":"=X"(s2))
> This tells gcc that the asm statement writes to s2, and thus kills it,
> but it actually does not generate any assembly language.


> Unfortunately, gcc-11.4 also introduced two additional redundant move
> instructions in every VM instruction, and Bernd Paysan reported that
> gcc-12 and gcc-13 introduced even more superfluous code in every VM
> instruction.

It is well known that compilers in general and gcc specfically often
generate superflous register moves; there are quite some PRs in
gcc's bug database on this; I have submitted a few of them myself,
such as which
includes compiler-generated code like

                movq %rdx, %rsi
                movq %rax, %rdx
                movq %rcx, 8(%rdi)
                movq %rsi, %rax
                movq %rdx, 16(%rdi)
                movq %rax, (%rdi)

where it is obviously to anybody who can read assembly that the
register moves are unneeded (although they are likely to
be zero-cycle operations because of register renaming).

However, if this got worse between releases, this is a regression.
Those get higher priority for fixing. So, if it is reasonable
to generate a reduced test case (for which cvise, for example,
is an excellent tool) so filing a bug report would be a good thing.

> This is similar to what we have seen from gcc-3.0 for
> Gforth at that time, and what we have seen from clang last we tried
> it.

> I tried to work around this issue by having the kills only at the end
> of VM instructions that perform a call, and indeed, that worked for
> gcc-11.4. However, gcc-12 and gcc-13 still produced bad code.
> Finally Bernd Paysan had the right idea and added -fno-tree-vectorize
> to the list of options that we use to avoid gcc shenanigans, and now
> we can also use this idea with gcc-12 and gcc-13.

That is strange, and would give valuable hints for investigating
this regression.

This sort of code is an example of the contradictions in today's
compiler technology. On the one hand, they do amazing optimizations
on large amounts of code which no programmer could hope to reach
while staying productive. On the other hand, it is very common
to see glaring inefficiencies when one looks at even small chunks
of code.

(A good assembler programmer can often beat compiler-generated
code by a factor of two or more, especially if SIMD is involved,
but SIMD is really hard to generate code for).

So far, nobody has found an algorithm for "just remove the
silliness" from compiled programs. Maybe it would be feasible to
run some peephole optimization as last passes which could improve
code like the one above, but that might also be difficult in the
more general case where registers are reused in other basic blocks
(which would mean just to redo the register allocation).

So, still work to do...

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.