Re: Using extra outputs of an instruction

Michael Meissner <meissner@redhat.com>
21 Mar 2002 21:56:42 -0500

          From comp.compilers

Related articles
Using extra outputs of an instruction dobes@dobesland.com (Dobes Vandermeer) (2002-03-09)
Re: Using extra outputs of an instruction vbdis@aol.com (2002-03-11)
Re: Using extra outputs of an instruction anton@mips.complang.tuwien.ac.at (2002-03-17)
Re: Using extra outputs of an instruction meissner@redhat.com (Michael Meissner) (2002-03-21)
Re: Using extra outputs of an instruction chase@world.std.com (David Chase) (2002-03-22)
Re: Using extra outputs of an instruction bop2@telia.com (Bo Persson) (2002-03-24)
| List of all articles for this month |

From: Michael Meissner <meissner@redhat.com>
Newsgroups: comp.compilers
Date: 21 Mar 2002 21:56:42 -0500
Organization: Compilers Central
References: 02-03-017
Keywords: architecture
Posted-Date: 21 Mar 2002 21:56:41 EST

Dobes Vandermeer <dobes@dobesland.com> writes:


> I initially realised the futility when I was entering the
> specification of Intel's XCHG operator. I looked at it and said,
> "this selector will only see a register move here; it has no reason to
> select XCHG over MOV."


Note, you probably don't want to use XCHG if one of the operands is a
memory operation, since IIRC, it is usually a serializing or cache
flushing instruciton (since the main reason for XCHG to memory is to
implement things like mutexes, and for multiprocessor machines, it
needs to go out to the memory bus). About 11 years ago, I learned
this when I writing the Motorola 88k GCC back end, and added the
optimization to recognize the exchange, and one of the benchmarks used
it (probably either dhrystone or the Stanford integer benchmarks), and
I discovered that the code ran more slowly.


> Later, I was writing code to emit function prologue's for both
> architectures, and I realised that because the instruction selector
> can not determine that in one case, the updating write (be it PUSH or
> stwu) is useful when that value is re-used for other purposes, and yet
> for a "normal" write the update is undesirable, because you'd have to
> use a temporary register to store that address for every write.


It sounds like you need a more powerful intermediate language that can
represent instructions that have more than one output, such as
exchange, memory operations with increment, and division. However,
things like SSA really want a single output.


For example, GCC's intermediate output is based on LISP rather than the more
normal tuples, and can represent this quite well.


For example, the indexed load with update instruction sequence (ie, *p++) would
be initially represented by two separate instructions:


                (set (reg:SI 123)
                          (mem:SI (plus:SI (reg:SI 124)
                                                            (const_int 1))))


                (set (reg:SI 124)
                          (plus:SI (reg:SI 124)
                                            (const_int 1)))


The combination phase of the compiler will attempt to combine these two
instructions to see if there is a complex instruction that handles it. On the
PowerPC there is, so it would generate:


                (parallel [(set (reg:SI 123)
                                                (mem:SI (plus:SI (reg:SI 124)
                                                                                  (const_int 1))))
                                      (set (reg:SI 124)
                                                (plus:SI (reg:SI 124)
                                                                  (const_int 1)))])


Ie, this is one instruction that has two operations, the load, and the
increment. Now, there are places where (parallel) operators aren't handled as
well as they should in GCC, but you should get the idea how it is represented
in GCC's rtl.


> I haven't seen much in the literature about solving this problem; most
> people seem to use a peephole optimizer to collapse the two
> instructions. To me this would only be acceptable if I could
> automatically generate this peephole optimizer and it was fast; this
> is entirely possible but I see it as more of a fall-back mode. The
> general opinion of DAG generators is that they are in principle
> "NP-complete" but a little too time-consuming to bother with.


Generally, peephole processing at the end misses cases. In GCC,
instruction combination occurs after CSE, loop optimizations, and flow
analysis, and before the first scheduling pass and register
allocation.


--
Michael Meissner, Red Hat, Inc.
PMB 198, 174 Littleton Road #3, Westford, Massachusetts 01886, USA
Work: meissner@redhat.com phone: +1 978-486-9304
Non-work: meissner@the-meissners.org fax: +1 978-692-4482
[The XCHG instruction only locks the bus if it has a LOCK prefix, but it's
slow anyway. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.