Re: C Compilers which use full 486 capabilities (David Keppel)
18 Mar 91 17:46:49 GMT

          From comp.compilers

Related articles
C Compilers which use full 486 capabilities (Mick Carrick) (1991-03-13)
Re: C Compilers which use full 486 capabilities (1991-03-15)
Re: C Compilers which use full 486 capabilities (1991-03-15)
Re: C Compilers which use full 486 capabilities (1991-03-16)
Re: C Compilers which use full 486 capabilities rex@aussie.COM (1991-03-17)
Re: C Compilers which use full 486 capabilities (1991-03-18)
Re: C Compilers which use full 486 capabilities (1991-03-19)
| List of all articles for this month |

Newsgroups: comp.compilers
From: (David Keppel)
Keywords: 486, optimize
Organization: University of Washington, Computer Science, Seattle
References: <> <> <>
Date: 18 Mar 91 17:46:49 GMT

> (Tony Mason) writes:
>>[quoting from Dr. Dobbs]
>>[Old optimization rules fail on the 486, where, a move to or from memory
>> takes one cycle, but exchanging two registers takes three cycles.] (Preston Briggs) writes:
>Does anyone's compiler generate an exchange instruction?
>I can visualize uses, but I'm not sure how I'd recognize it.
>How would it integrate with register allocation?

John Levine (compilers-request) writes:
>[Most uses I have seen are for atomic operations.]

It is certainly the case that exchange is used for atomic operations.
But that is because there are no `ordinary' instructions that will do
the trick. The other half of the question is ``what non-atomic
operations can make use of the exchange instruction?'' I have an
answer in two parts: First, example code that uses an abstract
exchange operation. Second, details of the the iX86 `xchg' operation.

* Example Code

Here is code that the compiler could recognize as using exchange. I
saw this code yesterday in a data compression algorithm:

t := prev_val;
prev_val := a[i];
a[i] := t;

A reasonable implementation could make use of exchange, and a simple
peephole optimizer could replace instances of

mov rx, ry
ld rx, addr
st ry addr


xchg rx, addr

(as long as `ry' is not used subsequently). I can imagine a register
allocator that instead of doing spills and restores with

mov r0, -46[fp]
mov -50[fp], r0

figures out the lifetimes of the variables, figures out that they're
disjoint (after all, the stack slots got generated in the first place
becase there weren't enough empty registers...), creates one merged
stack slot, and replaces the above with

xchg r0, -44[fp]

This has advantages of (a) smaller code size and (b) better stack
locality (decreasing the stack size and increasing the utilization of
particular stack slots). Note that no analysis of user code is needed
for this latter operation, the spills and restores are

* iX86 `XCHG'

The iX86 has a `lock' prefix that can be applied to several
instructions. When those instructions are `lock'-prefixed, they are
guaranteed to behave atomically wrt memory. On a multiprocessor, if
two processors simultaneously execute

lock incl 4[r0]

then the value at 4[r0] is guaranteed to be incremented by 2 (one for
each processor). The `xchg' instruction can have the `lock' prefix
applied to it but -- for reasons I know not what -- omitting the
`lock' prefix has exactly the same effect, namely the exchange
operation is atomic whether or not you use the `lock' prefix.

Thus, on the i486 reads and writes that hit in the cache never go
off-chip. Except that all `xchg' instructions must assert a line off
chip saying ``don't touch (this) memory right now''. The 3-cycle
estimate is optmistic -- on a multiprocessor with a heavily-loaded
bus, atomic bus operations take substantially longer.

To summarize: it isn't hard to think of cases when exchange could be
useful, but you don't want to do it on the iX86 family because the
exchange instruction is defined as having atomic semantics that must
be enforced at any cost.

;-D on ( Look, ma, no peephole! ) Pardo

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.