Wisconsin Wind Tunnel ASPLOS Preprints

larus@cs.wisc.edu (James Larus)
Mon, 25 Jul 1994 11:32:57 GMT

From comp.compilers

Related articles
*Wisconsin Wind Tunnel ASPLOS Preprints larus@cs.wisc.edu* (1994-07-25)**

| List of all articles for this month |

Newsgroups:	comp.arch,comp.parallel,comp.sys.super,comp.compilers
From:	larus@cs.wisc.edu (James Larus)
Originator:	paramod@crystal.epcc.ed.ac.uk
Organization:	U of Wisconsin CS Dept
Date:	Mon, 25 Jul 1994 11:32:57 GMT

The Wisconsin Wind Tunnel (WWT) project has made available online
three parallel computing papers that will appear in ASPLOS VI in
October:

* asplos6_fine_grain.ps discusses techniques for implementing
      fine-grain distributed shared memory with two case studies on a
      Thinking Machines CM-5.

* asplos6_sm_mp.ps compares four shared-memory and message-passing
      programs running on detailed architectural simulators of comparable
      machines.

* asplos6_lcm.ps shows how a custom memory system, built on user-level
      distributed shared memory, can help support C**, a high-level data
      parallel language.

Below is information on the Wisconsin Wind Tunnel project,
instructions for online access, and abstracts for the three papers.

------------------------------------------------------------------------

Title: The Wisconsin Wind Tunnel Project
PIs: Mark D. Hill, James R. Larus, David A. Wood
Email: wwt@cs.wisc.edu
Mosiac Home Page: http://www.cs.wisc.edu/p/wwt/Mosaic/wwt.html
Anonymous FTP: ftp ftp.cs.wisc.edu; cd wwt

Most future massively-parallel computers will be built from
workstation-like nodes and programmed in high-level parallel
languages--like HPF--that support a shared address space in which
processes uniformly reference data.

The Wisconsin Wind Tunnel project seeks to develop a consensus about
the middle-level interface--below languages and compilers and above
system software and hardware. Our first proposed interface was
COOPERATIVE SHARED MEMORY, which is an evolutionary extension to
conventional shared-memory software and hardware. Recently, we have
been working on a more revolutionary interface called TEMPEST.
TEMPEST provides the mechanisms that allow programmers, compilers, and
program libraries to implement and use message passing, transparent
shared memory, and hybrid combinations of the two. We are developing
implementations of TEMPEST on a Thinking Machines CM-5, a second real
platform, and a hypothetical hardware platform.

We refine our design ideas with an execution-driven simulation system
called the WISCONSIN WIND TUNNEL. It runs a parallel shared-memory
program on a parallel computer (Thinking Machines CM-5) and uses
execution-driven, distributed, discrete-event simulation to accurately
calculate program execution time. The Wisconsin Wind Tunnel project
is so named because we use our tools to cull the design space of
parallel supercomputers in a manner similar to how aeronautical
engineers use conventional wind tunnels to design airplanes.

Our ftp and Mosaic (http://www.cs.wisc.edu/p/wwt/Mosaic/wwt.html)
sites contain the following papers in compressed postscript:

annobib.ps.Z An overview and annotated bibliography
asplos5_csm.ps.Z First paper on cooperative shared memory
asplos6_fine_grain.ps.Z Discusses fine-grain access control and Blizzard
asplos6_lcm.ps.Z Loosely coherent memory (LCM) support for C**
asplos6_sm_mp.ps.Z Compares 4 shared memory and message passing programs
tocs93_csm.ps.Z Revised version of asplos5_csm
sigmetrics93_wwt.ps.Z First paper on the Wisconsin Wind Tunnel (WWT)
isca93_mechanisms.ps.Z Examines directory protocol complexity & performance
usenix93_kernel.ps.Z Explores OS support for WWT
wwt_tutorial.ps.Z Tutorial for new users of WWT
p4_cico.ps.Z Describes check-in-check-out programming model
hw_sw_sm.ps.Z Discussion of compiler and hardware shared memory.
cce_electrostatics.ps.Z Solving Microstructure Electrostatics on CSM
traenkle_ms.ps.Z M.S. Thesis that elaborates on cce_electrostatics.ps.Z
isca94_typhoon.ps.Z Tempest and Typhoon: user-level shared memory
pads94_costperf.ps.Z Examines cost-performance of parallel simulation
icpp94_cachier.ps.Z Cachier automatically inserts CICO annotations.
ics94_directory.ps.Z Proposes and evaluates multicast directory protocols
Misc/ Directory of miscellaneous things (see README's)

FTP Directions:

ftp ftp.cs.wisc.edu
reply to login: anonymous
reply to passwd: type any non-null string here
binary
cd wwt
get README
get FILENAME1
get FILENAME2
...
bye

------------------------------------------------------------------------

@INPROCEEDINGS{schoinas:fine-grain,
        AUTHOR = "Ioannis Schoinas and Babak Falsafi and Alvin R. Lebeck
and Steven K. Reinhardt and James R. Larus and David A. Wood",
        TITLE = "Fine-grain Access Control for Distributed Shared Memory",
        BOOKTITLE = ASPLOS6,
        YEAR = 1994,
        MONTH = Oct,
        NOTE = "To appear.}

This paper discusses implementations of fine-grain memory access
control, which selectively restricts reads and writes to
cache-block-sized memory regions. Fine-grain access control forms the
basis of efficient cache-coherent shared memory. This paper focuses on
low-cost implementations that require little or no additional
hardware. These techniques permit efficient implementation of shared
memory on a wide range of parallel systems, thereby providing
shared-memory codes with a portability previously limited to message
passing.

This paper categorizes techniques based on where access control is
enforced and where access conflicts are handled. We incorporated
three techniques that require no additional hardware into Blizzard, a
system that supports distributed shared memory on the CM-5. The first
adds a software lookup before each shared-memory reference by
modifying the program's executable. The second uses the memory's
error correcting code (ECC) as cache-block valid bits. The third is a
hybrid. The software technique ranged from slightly faster to two
times slower than the ECC approach. Blizzard's performance is roughly
comparable to a hardware shared-memory machine. These results argue
that clusters of workstations or personal computers with networks
comparable to the CM-5's will be able to support the same
shared-memory interfaces as supercomputers.

------------------------------------------------------------------------

@INPROCEEDINGS{chandra:where,
        AUTHOR = "Satish Chandra and James R. Larus and Anne Rogers",
        TITLE = "Where is Time Spent in Message-Passing and Shared-Memory
Programs?",
        BOOKTITLE = ASPLOS6,
        YEAR = 1994,
        MONTH = Oct,
        NOTE = "To appear."}

Message passing and shared memory are two techniques parallel programs
use for coordination and communication. This paper studies the
strengths and weaknesses of these two mechanisms by comparing
equivalent, well-written message-passing and shared-memory programs
running on similar hardware. To ensure that our measurements are
comparable, we produced two carefully tuned versions of each program
and measured them on closely-related simulators of a message-passing
and a shared-memory machine, both of which are based on same
underlying hardware assumptions.

We examined the behavior and performance of each program carefully.
Although the cost of computation in each pair of programs was similar,
synchronization and communication differed greatly. We found that
message-passing's advantage over shared-memory is not clear-cut.
Three of the four shared-memory programs ran at roughly the same speed
as their message-passing equivalent, even though their communication
patterns were different.

------------------------------------------------------------------------

@INPROCEEDINGS{larus:lcm,
        AUTHOR = "James R. Larus and Brad Richards and Guhan Viswanathan",
        TITLE = "LCM: Memory System Support for Parallel Language Implementation",
        BOOKTITLE = ASPLOS6,
        YEAR = 1994,
        MONTH = Oct,
        NOTE = "To appear."}

Higher-level parallel programming languages can be difficult to
implement efficiently on parallel machines. This paper shows how a
flexible, compiler-controlled memory system can help achieve good
performance for language constructs that previously appeared too
costly to be practical.

Our compiler-controlled memory system is called Loosely Coherent
Memory (LCM). It is an example of a larger class of Reconcilable
Shared Memory (RSM) systems, which generalize the replication and
merge policies of cache-coherent shared-memory. RSM protocols differ
in the action taken by a processor in response to a "request" for a
location and the way in which a processor "reconciles" multiple
outstanding copies of a location. LCM memory becomes temporarily
inconsistent to implement the semantics of C** parallel functions
efficiently. RSM provides a compiler with control over memory-system
policies, which it can use to implement a language's semantics,
improve performance, or detect errors. We illustrate the first two
points with LCM and our compiler for the data-parallel language C**.

------------------------------------------------------------------------

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Wisconsin Wind Tunnel ASPLOS Preprints

larus@cs.wisc.edu (James Larus)Mon, 25 Jul 1994 11:32:57 GMT

larus@cs.wisc.edu (James Larus)
Mon, 25 Jul 1994 11:32:57 GMT