Related articles |
---|
[27 earlier articles] |
Re: Folk Theorem: Assemblers are superior to Compilers steven.parker@acadiau.ca (1993-11-02) |
Re: Folk Theorem: Assemblers are superior to Compilers pardo@cs.washington.edu (1993-11-03) |
Re: Folk Theorem: Assemblers are superior to Compilers kanze@us-es.sel.de (James Kanze) (1993-11-03) |
Re: Folk Theorem: Assemblers are superior to Compilers vthrc@mailbox.uq.oz.au (Danny Thomas) (1993-11-05) |
Re: Folk Theorem: Assemblers are superior to Compilers lenngray@netcom.com (1993-11-07) |
Re: Folk Theorem: Assemblers are superior to Compilers rfg@netcom.com (1993-11-13) |
Re: Folk Theorem: Assemblers are superior to Compilers synaptx!thymus!daveg@uunet.UU.NET (Dave Gillespie) (1993-11-15) |
Newsgroups: | comp.compilers |
From: | Dave Gillespie <synaptx!thymus!daveg@uunet.UU.NET> |
Keywords: | assembler, optimize, performance |
Organization: | Compilers Central |
References: | 93-10-114 93-11-084 |
Date: | Mon, 15 Nov 1993 18:17:54 GMT |
[I wrote:]
>How many languages have a declaration that
>tells the compiler that a given pointer, or even a given integer, is a
>multiple of 16?
Ron Guilmette writes:
> In the case of the C language, we are (I think) fortunate to have certain
> "industry standards", which, in many cases, go beyond the requirements
> laid down by the international ISO C standard.
We know about that industry standard, and it's saved our bacon--- it would
be incredibly painful for the programmer to arrange for proper alignment
if "new" and "malloc" didn't give that guarantee.
I don't think our compiler guarantees arrays on the stack to be
quadword aligned; the documentation certainly doesn't mention any
such guarantee, and we have never needed to check it out.
> In the case of the i860 (in particular) the ps-ABI for this processor does
> indeed require compilers to align all data objects (and members of struct
> and union types) which have type `long double' to 16 bytes boundaries.
I think you may have missed my point: It's not that we want to load one
quad-float at once, it's that we want to load *four* single-floats at
once. Say you're doing a vector "a = b*c" operation; for every one-cycle
multiply, you need three load/stores. With a bit of loop unrolling plus
load/store-quad, you can get your three load/stores per cycle with room to
spare.
This is really an issue of information at the procedure-call boundary.
(In that sense it's a relative of the infamous "noalias" problem.) Say I
have a function
double sum_vector(double *p, int n);
At first glance, the ABI might imply that "sum_vector" can assume that "p"
is quadword aligned on an 860. But of course it can't; there's nothing
stopping the programmer from writing
double array[10];
double last_five = sum_vector(&array[5], 5);
The pointer "p" has the wrong alignment now. And this is nothing specific
to C; even number-friendly FORTRAN has this problem. The only way you can
do it is with exhaustive interprocedural analysis, non-standard
declarations, or having the compiler automatically write "sum_vector" in
the form of
if (happy(p)) <fast-loop> else <slow-loop>
which is hard to make into a general solution.
The compiler we use offers none of these, so the load-quad instruction is
simply out of its reach.
-- Dave
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.