Related articles |
---|
MIPS assembler pessimizing instruction scheduling moss@cs.cmu.edu (1992-12-03) |
Newsgroups: | comp.compilers |
From: | moss@cs.cmu.edu |
Organization: | Dept of Comp and Info Sci, Univ of Mass (Amherst) |
Date: | Thu, 3 Dec 1992 15:52:17 GMT |
Keywords: | assembler |
I couldn't think of a better place to post this, since it is related to
use of a compiler, though the ultimate problem may reside in the MIPS as
assembler. Anyway, here's the deal.
I was trying to help a colleague write a piece of C code that needs to be
really fast -- it's copying data to a memory mapped device register for a
high speed network interface board. I tried the following code (simplified
to show the problem):
typedef volatile int vint;
vint *global_p; /* initialized elsewhere to point to device register */
void foo (int *addr) {
int *src_p = addr;
vint *dst_p = global_p;
*dst_p = 0x11223344; /* header word */
*dst_p = src_p[0];
*dst_p = src_p[1];
*dst_p = src_p[2];
*dst_p = src_p[3];
*dst_p = src_p[4];
*dst_p = src_p[5];
*dst_p = src_p[6];
*dst_p = src_p[7];
}
The compiler allocated src_p and dst_p to registers and the repeated
assignments turned into code of this form:
lw $xx,k($src)
nop
sw $xx,0($dst)
Given that any memory store could (as far as the compiler/assembler know)
write over the src_p data, we cannot reorder to improve this. To get better
code I tried the following kind of improvement:
int t0, t1;
t0 = src_p[0];
t1 = src_p[1];
*dst_p = t0;
*dst_p = t1;
t0 = src_p[2];
... etc ...
With hopes of getting:
lw $t0,k1($src)
lw $t1,k2($src)
sw $t0,0($dst)
sw $t1,0($dst)
Curiously I got this instead (MIPS as 1.something and 2.0, gcc and DEC/MIPS
cc):
lw $t1,k2($src)
lw $t0,k1($src)
nop
sw $t0,0($dst)
sw $t1,0($dst)
The assembly code produced by the compilers looked like this:
lw $t0,k1($src)
lw $t1,k2($src)
.set volatile
sw $t0,0($dst)
.set novolatile
.set volatile
sw $t1,0($dst)
.set novolatile
Clearly the assembler is making the code worse! There is a solution,
though I found it a bit distasteful: FORCE the ordering of the loads by
using volatile on them as well. This is accomplished by writing "vint
*src_p = (vint*)(addr)" rather than "int *src_p = addr". Doing so I got
the desired code.
I would be interested in understanding what algorithm the assembler
applied that (presumably) improves things in some cases, but makes things
worse here. Also, whether there is a MIPS assembler that does better.
--
J. Eliot B. Moss, Associate Professor Visiting Associate Professor
Department of Computer Science School of Computer Science
Lederle Graduate Research Center Carnegie Mellon University
University of Massachusetts 5000 Forbes Avenue
Amherst, MA 01003 Pittsburgh, PA 15213-3891
(413) 545-4206, 545-1249 (fax) (412) 268-6767, 681-5739 (fax)
Moss@cs.umass.edu Moss@cs.cmu.edu
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.