MIPS assembler pessimizing instruction scheduling

moss@cs.cmu.edu
Thu, 3 Dec 1992 15:52:17 GMT

          From comp.compilers

Related articles
MIPS assembler pessimizing instruction scheduling moss@cs.cmu.edu (1992-12-03)
| List of all articles for this month |
Newsgroups: comp.compilers
From: moss@cs.cmu.edu
Organization: Dept of Comp and Info Sci, Univ of Mass (Amherst)
Date: Thu, 3 Dec 1992 15:52:17 GMT
Keywords: assembler

I couldn't think of a better place to post this, since it is related to
use of a compiler, though the ultimate problem may reside in the MIPS as
assembler. Anyway, here's the deal.


I was trying to help a colleague write a piece of C code that needs to be
really fast -- it's copying data to a memory mapped device register for a
high speed network interface board. I tried the following code (simplified
to show the problem):


        typedef volatile int vint;
        vint *global_p; /* initialized elsewhere to point to device register */
        void foo (int *addr) {
            int *src_p = addr;
            vint *dst_p = global_p;
            *dst_p = 0x11223344; /* header word */
            *dst_p = src_p[0];
            *dst_p = src_p[1];
            *dst_p = src_p[2];
            *dst_p = src_p[3];
            *dst_p = src_p[4];
            *dst_p = src_p[5];
            *dst_p = src_p[6];
            *dst_p = src_p[7];
        }


The compiler allocated src_p and dst_p to registers and the repeated
assignments turned into code of this form:


        lw $xx,k($src)
        nop
        sw $xx,0($dst)


Given that any memory store could (as far as the compiler/assembler know)
write over the src_p data, we cannot reorder to improve this. To get better
code I tried the following kind of improvement:


            int t0, t1;
            t0 = src_p[0];
            t1 = src_p[1];
            *dst_p = t0;
            *dst_p = t1;
            t0 = src_p[2];
            ... etc ...


With hopes of getting:


        lw $t0,k1($src)
        lw $t1,k2($src)
        sw $t0,0($dst)
        sw $t1,0($dst)


Curiously I got this instead (MIPS as 1.something and 2.0, gcc and DEC/MIPS
cc):


        lw $t1,k2($src)
        lw $t0,k1($src)
        nop
        sw $t0,0($dst)
        sw $t1,0($dst)


The assembly code produced by the compilers looked like this:


        lw $t0,k1($src)
        lw $t1,k2($src)
        .set volatile
        sw $t0,0($dst)
        .set novolatile
        .set volatile
        sw $t1,0($dst)
        .set novolatile


Clearly the assembler is making the code worse! There is a solution,
though I found it a bit distasteful: FORCE the ordering of the loads by
using volatile on them as well. This is accomplished by writing "vint
*src_p = (vint*)(addr)" rather than "int *src_p = addr". Doing so I got
the desired code.


I would be interested in understanding what algorithm the assembler
applied that (presumably) improves things in some cases, but makes things
worse here. Also, whether there is a MIPS assembler that does better.
--
J. Eliot B. Moss, Associate Professor Visiting Associate Professor
Department of Computer Science School of Computer Science
Lederle Graduate Research Center Carnegie Mellon University
University of Massachusetts 5000 Forbes Avenue
Amherst, MA 01003 Pittsburgh, PA 15213-3891
(413) 545-4206, 545-1249 (fax) (412) 268-6767, 681-5739 (fax)
Moss@cs.umass.edu Moss@cs.cmu.edu
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.