Re: Use of unaligned load/stores by compilers

hrubin@stat.purdue.edu (Herman Rubin)
1 Feb 1998 21:41:52 -0500

          From comp.compilers

Related articles
[3 earlier articles]
Re: Use of unaligned load/stores by compilers hrubin@stat.purdue.edu (1998-01-26)
Re: Use of unaligned load/stores by compilers - or doing things by hal henry@zoo.toronto.edu (Henry Spencer) (1998-01-26)
Re: Use of unaligned load/stores by compilers scott@basis.com (1998-01-30)
Re: Use of unaligned load/stores by compilers reid@micro.ti.com (Reid Tatge) (1998-01-30)
Re: Use of unaligned load/stores by compilers tgl@netcom.com (Tom Lane) (1998-02-01)
Re: Use of unaligned load/stores by compilers albaugh@agames.com (1998-02-01)
Re: Use of unaligned load/stores by compilers hrubin@stat.purdue.edu (1998-02-01)
Re: Use of unaligned load/stores by compilers dlmoore@ix.netcom.com (David L Moore) (1998-02-01)
Re: Use of unaligned load/stores by compilers fpeelo@portablesolutions.com (Frank Peelo) (1998-02-07)
| List of all articles for this month |
From: hrubin@stat.purdue.edu (Herman Rubin)
Newsgroups: comp.compilers
Date: 1 Feb 1998 21:41:52 -0500
Organization: Purdue University Statistics Department
References: 98-01-099 98-01-114 98-01-119 98-02-008
Keywords: architecture

>Reid Tatge <reid@micro.ti.com> writes:
>> Even in this X86-dominated world, it always surprises me how many
>> programmers are completely unaware that there is such a thing as
>> hardware enforced alignment of memory references. <Sigh>


>> [I think this is the 1990s equivalent of the old observation that a
>> determined programmer can write a Fortran program in any language. -John]


Tom Lane <tgl@netcom.com> wrote:
>Perhaps you can take some comfort in the fact that said programmers
>are paying a huge runtime price for their cluelessness, even on their
>favorite X86 architecture.


I have never used an X86, but I have deliberately set up unaligned
loads and stores on a VAX. The runtime was much faster than doing
the job any other way.


  Misaligned reads and writes require
>multiple memory cycles on the X86 series, too (at least on anything
>with more than an 8-bit path to memory).


This may be highly architecture dependent. See the moderator's
comments below.


The fact remains that there are cases where, if the hardware allows
it at all, and the job can be done by unaligned loads/stores, it
may very well be cheaper to do it that way than by coming up with
workarounds. A move of an aligned 8-byte quad word to an
unaligned location might be faster than moving a computed number
of bytes starting with that location. The destination pointer
was read properly by the hardware, and adjusting the pointer as
a byte pointer was the fastest way to do it.


...............


> regards, tom lane




>[The fact that it's so slow suggests that in fact they don't care all
>that much. I was surprised to read that in the later IBM 390
>machines, there's very little penalty for misaligned data. The cache
>lines are quite long, so more often than not misaligned data is still
>in a single line in which case there's no penalty, or it's in two
>lines in which case there's in effect an extra load or store.


In the cases above on the VAX, there were two stores involved. The
amount of overhead on the instruction to move a string was far greater
than the additional store time, even though not all bytes were wanted.


The x86
>takes a three cycle hit if misaligned data is in a single line and a
>six to 12 cycle hit if it crosses into a second line. Intel's x86
>optimization manual says it's an important performance issue, and the
>Ppro and PII can count misaligned accesses as part of the internal
>profiling features. -John
--
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.