Chunks and Tasks: A programming model for parallelization of dynamic algorithms

•We present a new parallel programming model named Chunks and Tasks.•Designed to work well for dynamic algorithms.•Programmer does not need to provide data distribution.•Possible to achieve fault resilience at the library level.•Expands the applicability of high performance parallel computing. We pr...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Parallel computing Ročník 40; číslo 7; s. 328 - 343
Hlavní autoři: Rubensson, Emanuel H., Rudberg, Elias
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.07.2014
Témata:
ISSN:0167-8191, 1872-7336, 1872-7336
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•We present a new parallel programming model named Chunks and Tasks.•Designed to work well for dynamic algorithms.•Programmer does not need to provide data distribution.•Possible to achieve fault resilience at the library level.•Expands the applicability of high performance parallel computing. We propose Chunks and Tasks, a parallel programming model built on abstractions for both data and work. The application programmer specifies how data and work can be split into smaller pieces, chunks and tasks, respectively. The Chunks and Tasks library maps the chunks and tasks to physical resources. In this way we seek to combine user friendliness with high performance. An application programmer can express a parallel algorithm using a few simple building blocks, defining data and work objects and their relationships. No explicit communication calls are needed; the distribution of both work and data is handled by the Chunks and Tasks library. This makes efficient implementation of complex applications that require dynamic distribution of work and data easier. At the same time, Chunks and Tasks imposes restrictions on data access and task dependencies that facilitate the development of high performance parallel back ends. We discuss the fundamental abstractions underlying the programming model, as well as performance, determinism, and fault resilience considerations. We also present a pilot C++ library implementation for clusters of multicore machines and demonstrate its performance for irregular block-sparse matrix–matrix multiplication.
Bibliografie:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0167-8191
1872-7336
1872-7336
DOI:10.1016/j.parco.2013.09.006