|[27 earlier articles]|
|Re: Folk Theorem: Assemblers are superior to Compilers email@example.com (1993-11-02)|
|Re: Folk Theorem: Assemblers are superior to Compilers firstname.lastname@example.org (1993-11-03)|
|Re: Folk Theorem: Assemblers are superior to Compilers email@example.com (James Kanze) (1993-11-03)|
|Re: Folk Theorem: Assemblers are superior to Compilers firstname.lastname@example.org (Danny Thomas) (1993-11-05)|
|Re: Folk Theorem: Assemblers are superior to Compilers email@example.com (1993-11-07)|
|Re: Folk Theorem: Assemblers are superior to Compilers firstname.lastname@example.org (1993-11-13)|
|Re: Folk Theorem: Assemblers are superior to Compilers synaptx!thymus!daveg@uunet.UU.NET (Dave Gillespie) (1993-11-15)|
|From:||Dave Gillespie <synaptx!thymus!daveg@uunet.UU.NET>|
|Keywords:||assembler, optimize, performance|
|Date:||Mon, 15 Nov 1993 18:17:54 GMT|
>How many languages have a declaration that
>tells the compiler that a given pointer, or even a given integer, is a
>multiple of 16?
Ron Guilmette writes:
> In the case of the C language, we are (I think) fortunate to have certain
> "industry standards", which, in many cases, go beyond the requirements
> laid down by the international ISO C standard.
We know about that industry standard, and it's saved our bacon--- it would
be incredibly painful for the programmer to arrange for proper alignment
if "new" and "malloc" didn't give that guarantee.
I don't think our compiler guarantees arrays on the stack to be
quadword aligned; the documentation certainly doesn't mention any
such guarantee, and we have never needed to check it out.
> In the case of the i860 (in particular) the ps-ABI for this processor does
> indeed require compilers to align all data objects (and members of struct
> and union types) which have type `long double' to 16 bytes boundaries.
I think you may have missed my point: It's not that we want to load one
quad-float at once, it's that we want to load *four* single-floats at
once. Say you're doing a vector "a = b*c" operation; for every one-cycle
multiply, you need three load/stores. With a bit of loop unrolling plus
load/store-quad, you can get your three load/stores per cycle with room to
This is really an issue of information at the procedure-call boundary.
(In that sense it's a relative of the infamous "noalias" problem.) Say I
have a function
double sum_vector(double *p, int n);
At first glance, the ABI might imply that "sum_vector" can assume that "p"
is quadword aligned on an 860. But of course it can't; there's nothing
stopping the programmer from writing
double last_five = sum_vector(&array, 5);
The pointer "p" has the wrong alignment now. And this is nothing specific
to C; even number-friendly FORTRAN has this problem. The only way you can
do it is with exhaustive interprocedural analysis, non-standard
declarations, or having the compiler automatically write "sum_vector" in
the form of
if (happy(p)) <fast-loop> else <slow-loop>
which is hard to make into a general solution.
The compiler we use offers none of these, so the load-quad instruction is
simply out of its reach.
Return to the
Search the comp.compilers archives again.