CALL FOR PARTICIPATION: TUTORIAL on Embedded Streaming: Parallel Programming, Optimization and Tools

"harm.munk@NXP" <>
Mon, 18 May 2009 02:08:17 -0700 (PDT)

          From comp.compilers

Related articles
CALL FOR PARTICIPATION: TUTORIAL on Embedded Streaming: Parallel Progr (harm.munk@NXP) (2009-05-18)
| List of all articles for this month |

From: "harm.munk@NXP" <>
Newsgroups: comp.compilers
Date: Mon, 18 May 2009 02:08:17 -0700 (PDT)
Organization: Compilers Central
Keywords: conference, parallel
Posted-Date: 18 May 2009 12:53:57 EDT

[See also]


DATE and VENUE: June 12, 2009 (morning session), IBM T.J. Watson
Research Center in Yorktown Heights, NY

In conjunction with the 23rd ACM International Conference on
Supercomputing (ICS).


Streaming applications are based on a data-driven approach where
components consume and produce unbounded data vectors. Streaming
oriented systems have become dominant in a wide range of domains,
including embedded applications and DSPs (e.g. for video processing
and advanced communications for consumer systems). However,
programming efficiently for streaming architectures is a very
challenging task, having to carefully partition the computation and
map it to processes in a way that best matches the underlying multi-
core streaming architectures, as well as having to take into account
the needed resources (memory, real-time requirements, etc.) and
communication overheads (processing and delay) between the processors.

These challenges have led to a number of suggested solutions, whose
goal is to improve the programmer's efficiency in developing
applications that process massive streams of data on programmable,
parallel embedded architectures. StreamIt is one such example. Another
more recent approach is that developed by the ACOTES (Advanced
Compiler Technologies for Embedded Streaming) project. The ACOTES
approach for streaming applications consists of compiler-assisted
mapping of streaming tasks to multi-processor systems in order to
achieve cost-effective systems, both in terms of energy and in terms
of design costs. The analysis and transformation techniques automate
large parts of the partitioning and mapping process, based on the
properties of the application domain, on the quantitative information
about the target systems, and on programmer directives.

In the tutorial we will review the streaming domain, including main
trends, typical applications and architectures, programming
challenges, and available solutions. In particular, we will then
present and demonstrate the framework developed by the ACOTES project.
ACOTES includes partners from both industry and academia, whose goal
is to improve programmer's productivity using: (1) automatic
simulation and compilation techniques to abstract the underlying
multi- core hardware from the programmer, and (2) programmer hints
(pragmas) that define the inputs, outputs and control variables of the
computation, hinting to the underlying compilation system where the
borders of the components are. The actual components are then built
based on an abstract representation of the platform called the
Abstract Streaming Machine (ASM). The ASM expresses the processing
thread-level and data-level parallelism capabilities available, and in
addition communication overhead (processing and delay) between the
processors. The automatic compiler transformations then base their
parallelism related optimization decisions on the pragmas and the
resources needed by each constructed component mapped to each

We will walk-through a hands-on example of a streaming program,
starting from it's programming using special pragmas, through it's
multiple levels of compilation, all the way to actual execution on a
real streaming architecture.

The Topics that will be covered include:
* Introduction: The Streaming Domain - characteristics of
streaming applications and architectures, challenges, examples.
* Abstraction for Streaming architectures
* Streaming Programming Models  the ACOTES model vs. state-
the art languages and models.
* Compiler Optimizations for Thread/Data-level parallelism
(the polyhedral model, loop-nest optimizations, vectorization,
interaction between the two).
* Split Compilation and CLI


In this tutorial we present and demonstrate the outcomes of the ACOTES
project ( ), a 3-
year collaborative work of industrial (NXP, ST, IBM, Silicon Hive,
NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners, and
advocate the use the Advanced Compiler Technologies that we developed
to support Embedded Streaming.


Albert Cohen, INRIA
Xavier Martorell, UPC
Harm Munk, NXP
Dorit Nuzman, IBM
Andrea Ornstein, STMicroelectronics
Sebastian Pop, AMD
Uzi Shvadron, IBM
Ayal Zaks, IBM


* Paul Carpenter, David Rodenas, Xavier Martorell, Alex
Ramirez, Eduard Ayguade, "A Streaming Machine Description and
Programming Model ", Lecture Notes in Computer Science, Springer
Berlin/Heidelberg, 4599/2007, pp. 107-116, ISSN 0302-9743, August

* A. Cohen, M. Duranton, C. Eisenbeis, C. Pagetti, F. Plateau,
and M. Pouzet. N-sychronous Kahn networks. In 33th ACM Symp. on
Principles of Programming Languages (PoPL'06), pages 180--193,
Charleston, South Carolina, January 2006.

* S. Pop, A. Cohen, C. Bastoul, S. Girbal, G.-A. Silber, and
N. Vasilache. Graphite: Loop optimizations based on the polyhedral
model for GCC. In Proc. of the 4th GCC Developper's Summit, Ottawa,
Canada, June 2006.

* L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache.
Iterative optimization in the polyhedral model: Part I, one-
dimensional time. In ACM Conf. on Code Generation and Optimization
(CGO'07), San Jose, California, March 2007.

* L.-N. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos.
Iterative optimization in the polyhedral model: Part II,
multidimensional time. In ACM Conf. on Programming Language Design and
Implementation (PLDI'08), Tucson, Arizona, June 2008.

* N. Vasilache, A. Cohen, and L.-N. Pouchet. Automatic
correction of loop transformations. In Parallel Architectures and
Compilation Techniques (PACT'07), Brasov, Romania, September 2007.

* Dorit Nuzman and Richard Henderson. Multi-platform Auto-
vectorization. The 4th Annual International Symposium on Code
Generation and Optimization (CGO-4), March 26-29, 2006, Manhattan, New
York, 281-294

* Dorit Nuzman, Ira Rosen, and Ayal Zaks. Auto-Vectorization
of Interleaved Data for SIMD. PLDI, June 12-14, 2006, Ottawa, Canada,

* Dorit Nuzman and Ayal Zaks. Outer-Loop Vectorization 
Revisited for short SIMD architectures. In Conf. on Parallel
Architectures and Compilation Techniques (PACT) 2008

* Marco Cornero, Roberto Costa, Ricardo Fernandez Pascual,
Andrea C. Ornstein, and Erven Rohou. An experimental environment
validating the suitability of CLI as an effective deployment format
for embedded systems. In Proceedings of the 2008 International
Conference on High Performance and Embedded Architectures and
Compilers (HiPEAC'08), pages 130-144, Goteborg, Sweden, January 2008.
Lecture Notes in Computer Science 4917.

* Roberto Costa, Andrea C. Ornstein, and Erven Rohou. CLI
end in GCC. In GCC Developers' Summit, pages 111-116, Ottawa, Canada,
July 2007.

* Roberto Costa and Erven Rohou. Comparing the size of .NET
applications with native code. In Proceedings of the 3rd IEEE/ACM/IFIP
international conference on Hardware/software codesign and system
synthesis (CODES+ISSS), pages 99-104, Jersey City, NJ, USA, September
2005. ACM.

* Piotr Lesnicki, Albert Cohen, Grigori Fursin, Marco Cornero,
Andrea Ornstein, and Erven Rohou. Split compilation: an application to
just-in-time vectorization. In International Workshop on GCC for
Research in Embedded and Parallel Systems (GREPS), in conjunction with
PACT'07, Brasov, Romania, September 2007

* Gabriele Svelto, Andrea Ornstein, and Erven Rohou. A stack-
based internal representation for GCC. In First International Workshop
on GCC Research Opportunities (GROW09), in conjunction with HiPEAC
2009, Paphos, Cyprus, January 2009.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.