|Using extra outputs of an instruction email@example.com (Dobes Vandermeer) (2002-03-09)|
|Re: Using extra outputs of an instruction firstname.lastname@example.org (2002-03-11)|
|Re: Using extra outputs of an instruction email@example.com (2002-03-17)|
|Re: Using extra outputs of an instruction firstname.lastname@example.org (Michael Meissner) (2002-03-21)|
|Re: Using extra outputs of an instruction email@example.com (David Chase) (2002-03-22)|
|Re: Using extra outputs of an instruction firstname.lastname@example.org (Bo Persson) (2002-03-24)|
|From:||Dobes Vandermeer <email@example.com>|
|Date:||9 Mar 2002 02:54:11 -0500|
|Posted-Date:||09 Mar 2002 02:54:11 EST|
I have a written a code generator generator that takes a high-level
description of the CPU and produces an iburg-style labeller/reducer as
well as an assembler/disassembler and implementations of all the iburg
rule numbers. The client program converts the intermediate
representation's nodes into a "simpler" label tree (simpler because it
uses only arithmetic augmented with a READ and WRITE operator) and
then reduces the tree. The reducer functions provide enough
information to allow the reduce to determine which node to reduce
next, as well as register usage etc. (this interface is still fairly
However, I have realized that there are some instructions which are
useless when using tree matching. The first class of instruction is
one which produces multiple independent outputs (e.g. ia32's XCHG
operation which exchanges its operands). The second class is one
which has a side-effect but where there is an equivalent instruction
which does not (e.g. PowerPC's stwu vs. stw, where stwu adjusts the
base register by the offset; also PUSH vs. ADD under intel).
I initially realised the futility when I was entering the
specification of Intel's XCHG operator. I looked at it and said,
"this selector will only see a register move here; it has no reason to
select XCHG over MOV."
Later, I was writing code to emit function prologue's for both
architectures, and I realised that because the instruction selector
can not determine that in one case, the updating write (be it PUSH or
stwu) is useful when that value is re-used for other purposes, and yet
for a "normal" write the update is undesirable, because you'd have to
use a temporary register to store that address for every write.
Clearly what is needed here, is a way to determine at "labelling time"
that there is a cost-saving to having the side-effect in one case, and
yet in another there is no saving.
One idea that has occurred to me is have a rule which makes a choice
between instructions like stw and stwu (store and store with update)
at instruction-selection time. The code-generator-generator would
have to detect relationships like these, and for each possible set of
equivalent instructions, generate a new rule number. Then the
generated labeller would insert the alternate rule number where a
cost-savings *could* be gained. Then the implementation of that rule
would check whether the savings will actually be made; if so, it takes
that register as an additional output and calls the rule with the
side-effect; otherwise, it calls the original rule and its instruction
generates only the single output.
I haven't seen much in the literature about solving this problem; most
people seem to use a peephole optimizer to collapse the two
instructions. To me this would only be acceptable if I could
automatically generate this peephole optimizer and it was fast; this
is entirely possible but I see it as more of a fall-back mode. The
general opinion of DAG generators is that they are in principle
"NP-complete" but a little too time-consuming to bother with.
Any links, papers, suggestions, tips, comments, amusing flame-wars, etc.
are (of course) welcome and appreciated.
Return to the
Search the comp.compilers archives again.