Re: Folk Theorem: Assemblers are superior to Compilers

Dave Gillespie <synaptx!thymus!daveg@uunet.UU.NET>
Mon, 15 Nov 1993 18:17:54 GMT

          From comp.compilers

Related articles
[27 earlier articles]
Re: Folk Theorem: Assemblers are superior to Compilers steven.parker@acadiau.ca (1993-11-02)
Re: Folk Theorem: Assemblers are superior to Compilers pardo@cs.washington.edu (1993-11-03)
Re: Folk Theorem: Assemblers are superior to Compilers kanze@us-es.sel.de (James Kanze) (1993-11-03)
Re: Folk Theorem: Assemblers are superior to Compilers vthrc@mailbox.uq.oz.au (Danny Thomas) (1993-11-05)
Re: Folk Theorem: Assemblers are superior to Compilers lenngray@netcom.com (1993-11-07)
Re: Folk Theorem: Assemblers are superior to Compilers rfg@netcom.com (1993-11-13)
Re: Folk Theorem: Assemblers are superior to Compilers synaptx!thymus!daveg@uunet.UU.NET (Dave Gillespie) (1993-11-15)
| List of all articles for this month |

Newsgroups: comp.compilers
From: Dave Gillespie <synaptx!thymus!daveg@uunet.UU.NET>
Keywords: assembler, optimize, performance
Organization: Compilers Central
References: 93-10-114 93-11-084
Date: Mon, 15 Nov 1993 18:17:54 GMT

[I wrote:]
>How many languages have a declaration that
>tells the compiler that a given pointer, or even a given integer, is a
>multiple of 16?


Ron Guilmette writes:
> In the case of the C language, we are (I think) fortunate to have certain
> "industry standards", which, in many cases, go beyond the requirements
> laid down by the international ISO C standard.


We know about that industry standard, and it's saved our bacon--- it would
be incredibly painful for the programmer to arrange for proper alignment
if "new" and "malloc" didn't give that guarantee.


I don't think our compiler guarantees arrays on the stack to be
quadword aligned; the documentation certainly doesn't mention any
such guarantee, and we have never needed to check it out.


> In the case of the i860 (in particular) the ps-ABI for this processor does
> indeed require compilers to align all data objects (and members of struct
> and union types) which have type `long double' to 16 bytes boundaries.


I think you may have missed my point: It's not that we want to load one
quad-float at once, it's that we want to load *four* single-floats at
once. Say you're doing a vector "a = b*c" operation; for every one-cycle
multiply, you need three load/stores. With a bit of loop unrolling plus
load/store-quad, you can get your three load/stores per cycle with room to
spare.


This is really an issue of information at the procedure-call boundary.
(In that sense it's a relative of the infamous "noalias" problem.) Say I
have a function


double sum_vector(double *p, int n);


At first glance, the ABI might imply that "sum_vector" can assume that "p"
is quadword aligned on an 860. But of course it can't; there's nothing
stopping the programmer from writing


double array[10];
double last_five = sum_vector(&array[5], 5);


The pointer "p" has the wrong alignment now. And this is nothing specific
to C; even number-friendly FORTRAN has this problem. The only way you can
do it is with exhaustive interprocedural analysis, non-standard
declarations, or having the compiler automatically write "sum_vector" in
the form of


if (happy(p)) <fast-loop> else <slow-loop>


which is hard to make into a general solution.


The compiler we use offers none of these, so the load-quad instruction is
simply out of its reach.


-- Dave
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.