|[3 earlier articles]|
|Re: Use of unaligned load/stores by compilers firstname.lastname@example.org (1998-01-26)|
|Re: Use of unaligned load/stores by compilers - or doing things by hal email@example.com (Henry Spencer) (1998-01-26)|
|Re: Use of unaligned load/stores by compilers firstname.lastname@example.org (1998-01-30)|
|Re: Use of unaligned load/stores by compilers email@example.com (Reid Tatge) (1998-01-30)|
|Re: Use of unaligned load/stores by compilers firstname.lastname@example.org (Tom Lane) (1998-02-01)|
|Re: Use of unaligned load/stores by compilers email@example.com (1998-02-01)|
|Re: Use of unaligned load/stores by compilers firstname.lastname@example.org (1998-02-01)|
|Re: Use of unaligned load/stores by compilers email@example.com (David L Moore) (1998-02-01)|
|Re: Use of unaligned load/stores by compilers firstname.lastname@example.org (Frank Peelo) (1998-02-07)|
|From:||email@example.com (Herman Rubin)|
|Date:||1 Feb 1998 21:41:52 -0500|
|Organization:||Purdue University Statistics Department|
|References:||98-01-099 98-01-114 98-01-119 98-02-008|
>Reid Tatge <firstname.lastname@example.org> writes:
>> Even in this X86-dominated world, it always surprises me how many
>> programmers are completely unaware that there is such a thing as
>> hardware enforced alignment of memory references. <Sigh>
>> [I think this is the 1990s equivalent of the old observation that a
>> determined programmer can write a Fortran program in any language. -John]
Tom Lane <email@example.com> wrote:
>Perhaps you can take some comfort in the fact that said programmers
>are paying a huge runtime price for their cluelessness, even on their
>favorite X86 architecture.
I have never used an X86, but I have deliberately set up unaligned
loads and stores on a VAX. The runtime was much faster than doing
the job any other way.
Misaligned reads and writes require
>multiple memory cycles on the X86 series, too (at least on anything
>with more than an 8-bit path to memory).
This may be highly architecture dependent. See the moderator's
The fact remains that there are cases where, if the hardware allows
it at all, and the job can be done by unaligned loads/stores, it
may very well be cheaper to do it that way than by coming up with
workarounds. A move of an aligned 8-byte quad word to an
unaligned location might be faster than moving a computed number
of bytes starting with that location. The destination pointer
was read properly by the hardware, and adjusting the pointer as
a byte pointer was the fastest way to do it.
> regards, tom lane
>[The fact that it's so slow suggests that in fact they don't care all
>that much. I was surprised to read that in the later IBM 390
>machines, there's very little penalty for misaligned data. The cache
>lines are quite long, so more often than not misaligned data is still
>in a single line in which case there's no penalty, or it's in two
>lines in which case there's in effect an extra load or store.
In the cases above on the VAX, there were two stores involved. The
amount of overhead on the instruction to move a string was far greater
than the additional store time, even though not all bytes were wanted.
>takes a three cycle hit if misaligned data is in a single line and a
>six to 12 cycle hit if it crosses into a second line. Intel's x86
>optimization manual says it's an important performance issue, and the
>Ppro and PII can count misaligned accesses as part of the internal
>profiling features. -John
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
firstname.lastname@example.org Phone: (765)494-6054 FAX: (765)494-0558
Return to the
Search the comp.compilers archives again.