DuctTeip: An efficient programming model for distributed task-based parallel computing

•We introduce a hierarchical task parallel programming model for distributed memory systems.•We show that the new model provides both flexibility and performance.•We use the model to implement a Cholesky factorization and a solver for the shallow water equations.•We have compared our implementation...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Parallel computing Jg. 90; S. 102582
Hauptverfasser: Zafari, Afshin, Larsson, Elisabeth, Tillenius, Martin
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.12.2019
Schlagworte:
ISSN:0167-8191, 1872-7336, 1872-7336
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract •We introduce a hierarchical task parallel programming model for distributed memory systems.•We show that the new model provides both flexibility and performance.•We use the model to implement a Cholesky factorization and a solver for the shallow water equations.•We have compared our implementation with other frameworks and shown that it is competitive. Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored approaches. Task-based parallel programming has been successful both in simplifying the programming and in exploiting the available hardware parallelism for shared memory systems. In this paper we focus on how to extend task-parallel programming to distributed memory systems. We use a hierarchical decomposition of tasks and data in order to accommodate the different levels of hardware. We test the proposed programming model on two different applications, a Cholesky factorization, and a solver for the Shallow Water Equations. We also compare the performance of our implementation with that of other frameworks for distributed task-parallel programming, and show that it is competitive.
AbstractList •We introduce a hierarchical task parallel programming model for distributed memory systems.•We show that the new model provides both flexibility and performance.•We use the model to implement a Cholesky factorization and a solver for the shallow water equations.•We have compared our implementation with other frameworks and shown that it is competitive. Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored approaches. Task-based parallel programming has been successful both in simplifying the programming and in exploiting the available hardware parallelism for shared memory systems. In this paper we focus on how to extend task-parallel programming to distributed memory systems. We use a hierarchical decomposition of tasks and data in order to accommodate the different levels of hardware. We test the proposed programming model on two different applications, a Cholesky factorization, and a solver for the Shallow Water Equations. We also compare the performance of our implementation with that of other frameworks for distributed task-parallel programming, and show that it is competitive.
ArticleNumber 102582
Author Zafari, Afshin
Larsson, Elisabeth
Tillenius, Martin
Author_xml – sequence: 1
  givenname: Afshin
  surname: Zafari
  fullname: Zafari, Afshin
– sequence: 2
  givenname: Elisabeth
  orcidid: 0000-0003-1154-9587
  surname: Larsson
  fullname: Larsson, Elisabeth
  email: elisabeth.larsson@it.uu.se
– sequence: 3
  givenname: Martin
  surname: Tillenius
  fullname: Tillenius, Martin
BackLink https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-338832$$DView record from Swedish Publication Index (Uppsala universitet)
BookMark eNqFkF1LwzAUQINMcJv-Al_6A-zMR9u0gg9j8wsGvsy9hiy9GZltU5JU8d-bOfHBB33KJZxzL5wJGnW2A4QuCZ4RTIrr_ayXTtkZxaSKPzQv6Qkak5LTlDNWjNA4UjwtSUXO0MT7Pca4yEo8RpvloMIaTH-TzLsEtDbKQBeS3tmdk21rul3S2hqaRFuX1MYHZ7ZDgDoJ0r-mW-njGI_LpomMsm0_hOico1MtGw8X3-8UvdzfrReP6er54WkxX6WKZTikXBMdB1URjLNcVkzzmrJCKsaAgyIcOOEZQKbKMiMRyhWlWtNcAq63GbApujru9e_QD1vRO9NK9yGsNGJpNnNh3U4Mg2CsLBmNODviylnvHegfgWBxKCn24qukOJQUx5LRqn5ZygQZjO2Ck6b5x709uhArvBlwwh8CK6iNAxVEbc2f_icXrpNf
CitedBy_id crossref_primary_10_1007_s11128_023_04155_2
crossref_primary_10_1007_s00607_023_01190_w
crossref_primary_10_7717_peerj_cs_2966
crossref_primary_10_1007_s00521_022_07559_w
crossref_primary_10_1155_2021_6639008
crossref_primary_10_1016_j_bspc_2024_106358
crossref_primary_10_1016_j_parco_2023_103052
crossref_primary_10_1016_j_inffus_2021_02_008
Cites_doi 10.1002/cpe.1631
10.1006/jpdc.1996.0107
10.1109/CLUSTER.2014.6968739
10.1137/130943595
10.1145/2686892
10.1016/j.jcp.2015.06.003
10.1142/S0129626411000151
10.1007/978-3-319-78024-5_16
10.1177/1094342016635723
10.1145/2641764
10.1109/MCSE.2013.98
10.1137/140989716
10.1145/1556444.1556457
10.1145/2638554
10.1016/j.parco.2011.10.003
10.1016/j.parco.2013.09.006
ContentType Journal Article
Copyright 2019 Elsevier B.V.
Copyright_xml – notice: 2019 Elsevier B.V.
DBID AAYXX
CITATION
ACNBI
ADTPV
AOWAS
D8T
DF2
ZZAVC
DOI 10.1016/j.parco.2019.102582
DatabaseName CrossRef
SWEPUB Uppsala universitet full text
SwePub
SwePub Articles
SWEPUB Freely available online
SWEPUB Uppsala universitet
SwePub Articles full text
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-7336
ExternalDocumentID oai_DiVA_org_uu_338832
10_1016_j_parco_2019_102582
S0167819119301735
GrantInformation_xml – fundername: Swedish Research Council
  funderid: https://doi.org/10.13039/501100004359
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
123
1B1
1~.
1~5
29O
4.4
457
4G.
5VS
6OB
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABEFU
ABFNM
ABJNI
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
KOM
LG9
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SCC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
WH7
WUQ
XPP
ZMT
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ACNBI
ADTPV
AOWAS
D8T
DF2
ZZAVC
ID FETCH-LOGICAL-c340t-7f1f340c910045a93f7d236ac33e7ec17e7174ee4c88411005c22ff25ae0db4e3
ISICitedReferencesCount 11
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000501649400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0167-8191
1872-7336
IngestDate Tue Nov 04 16:55:57 EST 2025
Tue Nov 18 21:40:55 EST 2025
Sat Nov 29 07:24:40 EST 2025
Fri Feb 23 02:29:26 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Hierarchical decomposition
Distributed memory system
Scientific computing
High performance computing
Task-based parallel programming
65Y10
65Y05
68Q10
Data versioning
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c340t-7f1f340c910045a93f7d236ac33e7ec17e7174ee4c88411005c22ff25ae0db4e3
ORCID 0000-0003-1154-9587
OpenAccessLink https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-338832
ParticipantIDs swepub_primary_oai_DiVA_org_uu_338832
crossref_primary_10_1016_j_parco_2019_102582
crossref_citationtrail_10_1016_j_parco_2019_102582
elsevier_sciencedirect_doi_10_1016_j_parco_2019_102582
PublicationCentury 2000
PublicationDate 2019-12-01
PublicationDateYYYYMMDD 2019-12-01
PublicationDate_xml – month: 12
  year: 2019
  text: 2019-12-01
  day: 01
PublicationDecade 2010
PublicationTitle Parallel computing
PublicationYear 2019
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Pan, Pai (bib0036) 2015
Augonnet, Aumage, Furmento, Namyst, Thibault (bib0011) 2012
Gasulla (bib0020) 2017
Jimborean, Koukos, Spiliopoulos, Black, Kaxiras (bib0039) 2014
TBB, 2016,. Intel threading building blocks documentation
Anderson, Bai, DuCroz, Bai, Bischof, Blackford, Demmel, Dongarra, Du, Hammarling, Greenbaum, Mc, Sorensen (bib0047) 1999
Sergent, Goudin, Thibault, Aumage (bib0012) 2016
Intel
OpenMP, Architecture review board, 2013, OpenMP 4.0 Complete specifications
Vandierendonck, Tzenakis, Nikolopoulos (bib0004) 2011
Tillenius, Larsson, Badia, Martorell (bib0031) 2015; 14
Broquedis, Clet, Moreaud, Furmento, Goglin, Mercier, Thibault, Namyst (bib0041) 2010
Bosilca, Bouteiller, Danalis, Hérault, Lemarinier, Dongarra (bib0015) 2012; 38
Koukos, Schaffer, Spiliopoulos, Kaxiras (bib0040) 2013
Muddukrishna, Jonsson, Brorsson (bib0035) 2015
Rubensson, Rudberg (bib0018) 2014; 40
Moniruzzaman, Idrees, Rossbory, Gracia (bib0021) 2015
Kurt, Krishnamoorthy, Agrawal, Agrawal (bib0026) 2014
Cao, Herault, Bosilca, Dongarra (bib0029) 2015
Zheng (bib0022) 2005
Ni (bib0030) 2016
Drebes, Heydemann, Drach, Pop, Cohen (bib0034) 2014; 11
Holm, Engblom, Goude, Holmgren (bib0044) 2014; 36
Faxén (bib0003) 2009; 36
Augonnet, Thibault, Namyst, Wacrenier (bib0009) 2011; 23
Bouteiller, Herault, Bosilca, Du, Dongarra (bib0028) 2015; 1
Pericàs, Amer, Taura, Matsuoka (bib0037) 2013; 2014
.
Ceballos, Grass, Hugo, Black-Schaffer (bib0032) 2017
Choi, Dongarra, Ostrouchov, Petitet, Walker (bib0042) 1996; 5
Virouleau, Broquedis, Gautier, Rastello (bib0006) 2016
Kalé, Krishnan (bib0017) 1993
Zafari, Tillenius, Larsson (bib0019) 2012
July.
Tejedor, Farreras, Grove, Badia, Almasi, Labarta (bib0013) 2011
Bosilca, Bouteiller, Danalis, Faverge, Herault, Dongarra (bib0014) 2013; 15
Dongarra, Abalenkovs, Abdelfattah, Gates, Haidar, Kurzak, Luszczek, Tomov, Yamazaki, YarKhan (bib0001) 2016; 2
June.
Dolz, Sanchez, Blas, García, Carretero, Garcia-Blas, Ko, Mueller, Nakano (bib0024) 2016
Duran, Ayguadé, Badia, Labarta, Martinell, Martorell, Planas (bib0008) 2011; 21
Tillenius (bib0010) 2015; 37
Ceballos, Hagersten, Black-Schaffer (bib0033) 2015
Flyer, Wright (bib0043) 2009; 465
Tillenius, Larsson, Lehto, Flyer (bib0045) 2015; 298
Kalé (bib0016) 2011
Zafari (bib0025) 2018
Bauer, Engblom, Widgren (bib0046) 2016; 30
J. Lifflander, E. Meneses, H. Menon, P. Miller, S. Krishnamoorthy, L. Kale, Scalable replay with partial-order dependencies for message-logging fault tolerance, in: Proceedings of IEEE cluster, IEEE,, 2014, 19–28
Yoo, Hughes, Kim, Chen, Kozyrakis (bib0038) 2013
Blumofe, Joerg, Kuszmaul, Leiserson, Randall, Zhou (bib0002) 1996; 37
A. Zafari, E. Larsson, Distributed dynamic load balancing for task parallel programming, 2018, 13
Blumofe (10.1016/j.parco.2019.102582_bib0002) 1996; 37
Kalé (10.1016/j.parco.2019.102582_bib0017) 1993
Tejedor (10.1016/j.parco.2019.102582_bib0013) 2011
Moniruzzaman (10.1016/j.parco.2019.102582_bib0021) 2015
Vandierendonck (10.1016/j.parco.2019.102582_bib0004) 2011
Kalé (10.1016/j.parco.2019.102582_bib0016) 2011
Jimborean (10.1016/j.parco.2019.102582_bib0039) 2014
Yoo (10.1016/j.parco.2019.102582_bib0038) 2013
Pericàs (10.1016/j.parco.2019.102582_bib0037) 2013; 2014
Kurt (10.1016/j.parco.2019.102582_bib0026) 2014
Zafari (10.1016/j.parco.2019.102582_bib0025) 2018
Gasulla (10.1016/j.parco.2019.102582_bib0020) 2017
Rubensson (10.1016/j.parco.2019.102582_bib0018) 2014; 40
Anderson (10.1016/j.parco.2019.102582_bib0047) 1999
Bauer (10.1016/j.parco.2019.102582_bib0046) 2016; 30
Cao (10.1016/j.parco.2019.102582_bib0029) 2015
Dolz (10.1016/j.parco.2019.102582_bib0024) 2016
Bosilca (10.1016/j.parco.2019.102582_bib0015) 2012; 38
Bouteiller (10.1016/j.parco.2019.102582_bib0028) 2015; 1
Muddukrishna (10.1016/j.parco.2019.102582_bib0035) 2015
Augonnet (10.1016/j.parco.2019.102582_bib0009) 2011; 23
Ni (10.1016/j.parco.2019.102582_bib0030) 2016
Tillenius (10.1016/j.parco.2019.102582_bib0010) 2015; 37
Augonnet (10.1016/j.parco.2019.102582_bib0011) 2012
Bosilca (10.1016/j.parco.2019.102582_bib0014) 2013; 15
Duran (10.1016/j.parco.2019.102582_bib0008) 2011; 21
Zheng (10.1016/j.parco.2019.102582_bib0022) 2005
10.1016/j.parco.2019.102582_bib0007
Ceballos (10.1016/j.parco.2019.102582_bib0033) 2015
Koukos (10.1016/j.parco.2019.102582_bib0040) 2013
Choi (10.1016/j.parco.2019.102582_bib0042) 1996; 5
Holm (10.1016/j.parco.2019.102582_bib0044) 2014; 36
10.1016/j.parco.2019.102582_bib0005
10.1016/j.parco.2019.102582_bib0027
Drebes (10.1016/j.parco.2019.102582_sbref0030) 2014; 11
Dongarra (10.1016/j.parco.2019.102582_bib0001) 2016; 2
10.1016/j.parco.2019.102582_bib0023
Tillenius (10.1016/j.parco.2019.102582_bib0045) 2015; 298
Ceballos (10.1016/j.parco.2019.102582_bib0032) 2017
Virouleau (10.1016/j.parco.2019.102582_bib0006) 2016
Faxén (10.1016/j.parco.2019.102582_bib0003) 2009; 36
Pan (10.1016/j.parco.2019.102582_bib0036) 2015
Sergent (10.1016/j.parco.2019.102582_bib0012) 2016
Flyer (10.1016/j.parco.2019.102582_bib0043) 2009; 465
Broquedis (10.1016/j.parco.2019.102582_bib0041) 2010
Zafari (10.1016/j.parco.2019.102582_bib0019) 2012
Tillenius (10.1016/j.parco.2019.102582_bib0031) 2015; 14
References_xml – start-page: 6
  year: 2015
  end-page: 12
  ident: bib0021
  article-title: An adaptive load-balancer for task-scheduling in FastFlow
  publication-title: in: Proceedings of The Fifth International Conference on Advanced Communications and Computation (INFOCOMP IARIA),
– start-page: 765
  year: 2015
  end-page: 774
  ident: bib0029
  article-title: Design for a soft error resilient dynamic task-based runtime
  publication-title: IEEE International Parallel and Distributed Processing Symposium
– volume: 1
  start-page: 1
  year: 2015
  end-page: 10
  ident: bib0028
  article-title: Algorithm-based fault tolerance for dense matrix factorizations multiple failures and accuracy,
  publication-title: ACM Trans. Parallel Comput.
– start-page: 256
  year: 2011
  end-page: 264
  ident: bib0016
  article-title: Charm++, in: Encyclopedia of Parallel Computing
– year: 2017
  ident: bib0020
  publication-title: Dynamic load balancing for hybrid applications
– start-page: 1
  year: 2015
  end-page: 7.
  ident: bib0033
  article-title: StatTask: reuse distance analysis for task-based applications
  publication-title: in: Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
– start-page: 74
  year: 2016
  end-page: 87
  ident: bib0024
  article-title: A C++ Generic Parallel Pattern Interface for Stream Processing
  publication-title: Algorithms and Architectures for Parallel Processing
– start-page: 1
  year: 2015
  end-page: 12
  ident: bib0036
  article-title: Runtime-driven shared last-level cache management for task-parallel programs
  publication-title: in: SC15: International Conference for High Performance Computing, Networking, Storage and Analysis
– volume: 465
  start-page: 1949
  year: 2009
  end-page: 1976
  ident: bib0043
  article-title: A radial basis function method for the shallow water equations on a sphere
  publication-title: Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci
– volume: 298
  start-page: 406
  year: 2015
  end-page: 422
  ident: bib0045
  article-title: A scalable RBF-FD method for atmospheric flow
  publication-title: J. Comput. Phys
– volume: 37
  start-page: C617
  year: 2015
  end-page: C642
  ident: bib0010
  article-title: SuperGlue: a shared memory framework using data versioning for dependency-aware task-based parallelization
  publication-title: SIAM J. Sci. Comput
– volume: 5
  start-page: 173
  year: 1996
  end-page: 184
  ident: bib0042
  article-title: Design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines
  publication-title: Sci. Program.
– volume: 36
  start-page: C376
  year: 2014
  end-page: C399
  ident: bib0044
  article-title: Dynamic autotuning of adaptive fast multipole methods on hybrid multicore CPU and GPU systems
  publication-title: SIAM J. Sci. Comput.
– volume: 14
  start-page: 1
  year: 2015
  end-page: 5
  ident: bib0031
  article-title: Resource-aware task scheduling
  publication-title: ACM Trans. Embedded Comput. Syst.
– start-page: 169
  year: 2018
  end-page: 184
  ident: bib0025
  article-title: TaskUniVerse: A Task-based Unified Interface for Versatile Parallel Execution
  publication-title: 10777 of Lecture Notes in Computer Science
– start-page: 531
  year: 2016
  end-page: 544
  ident: bib0006
  article-title: Using Data Dependencies to Improve Task-based Scheduling Strategies on NUMA Architectures
  publication-title: Proceedings of Euro-Par 2016: Parallel Processing: 22nd International Conference on Parallel and Distributed Computing
– volume: 15
  start-page: 36
  year: 2013
  end-page: 45
  ident: bib0014
  article-title: PaRSEC: exploiting heterogeneity to enhance scalability
  publication-title: Comput. Sci. Eng.
– reference: OpenMP, Architecture review board, 2013, OpenMP 4.0 Complete specifications,
– volume: 11
  start-page: 1
  year: 2014
  end-page: 30
  ident: bib0034
  article-title: Topology-aware and dependence-aware scheduling and memory allocation for task-parallel languages
  publication-title: ACM Trans. Archit. Code Optim
– reference: TBB, 2016,. Intel threading building blocks documentation,
– volume: 37
  start-page: 55
  year: 1996
  end-page: 69
  ident: bib0002
  article-title: Cilk: an efficient multithreaded runtime system
  publication-title: J. Parallel Distrib. Comput.
– year: 2014
  ident: bib0039
  article-title: Fix the Code. Don’t Tweak the Hardware: A New Compiler Approach to Voltage-frequency Scaling
  publication-title: in: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
– year: 2005
  ident: bib0022
  publication-title: Achieving high performance on extremely large parallel machines: performance prediction and load balancing
– reference: J. Lifflander, E. Meneses, H. Menon, P. Miller, S. Krishnamoorthy, L. Kale, Scalable replay with partial-order dependencies for message-logging fault tolerance, in: Proceedings of IEEE cluster, IEEE,, 2014, 19–28,
– start-page: 11
  year: 2017
  end-page: 20
  ident: bib0032
  article-title: Taskinsight: Understanding Task Schedules Effects on Memory and Performance
  publication-title: in: Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores
– volume: 2014
  start-page: 73
  year: 2013
  end-page: 87
  ident: bib0037
  article-title: Analysis of Data Reuse in Task-parallel Runtimes
  publication-title: High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation. PMBS
– start-page: 180
  year: 2010
  end-page: 186
  ident: bib0041
  article-title: hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications
  publication-title: in: Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP 2010 IEEE,
– start-page: 1
  year: 2011
  end-page: 11
  ident: bib0004
  article-title: A unified scheduler for recursive and task dataflow parallelism
  publication-title: in: Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT ’11)
– reference: June.
– reference: Intel,
– start-page: 267
  year: 2011
  end-page: 268
  ident: bib0013
  article-title: ClusterSs: A Task-based Programming Model for Clusters
  publication-title: in: Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing (HPDC2011
– start-page: 91
  year: 1993
  end-page: 108
  ident: bib0017
  article-title: CHARM++: A portable concurrent object oriented system based on C++
  publication-title: in: Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA)
– volume: 2
  start-page: 67
  year: 2016
  end-page: 86
  ident: bib0001
  article-title: Parallel programming models for dense linear algebra on heterogeneous systems
  publication-title: Supercomput. Front. Innov
– start-page: 275
  year: 2012
  end-page: 280
  ident: bib0019
  article-title: Programming models based on data versioning for dependency-aware task-based parallelisation
  publication-title: in: Proceedings of the 15th IEEE International Conference on Computational Science and Engineering CSE
– volume: 38
  start-page: 37
  year: 2012
  end-page: 51
  ident: bib0015
  article-title: DAGuE: a generic distributed DAG engine for high performance computing
  publication-title: Parallel Comput.
– volume: 40
  start-page: 328
  year: 2014
  end-page: 343
  ident: bib0018
  article-title: Chunks and tasks: a programming model for parallelization of dynamic algorithms
  publication-title: Parallel Comput.
– year: 1999
  ident: bib0047
  article-title: LAPACK Users’ Guide (Third Ed.), Society for Industrial and Applied Mathematics
– year: 2016
  ident: bib0030
  publication-title: Mitigation of failures in high performance computing via runtime techniques
– reference: July.
– volume: 23
  start-page: 187
  year: 2011
  end-page: 198
  ident: bib0009
  article-title: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
  publication-title: Concurr. Comput.: Pract. Exper.
– start-page: 318
  year: 2016
  end-page: 327
  ident: bib0012
  article-title: Controlling the memory subscription of distributed applications with a task-based runtime system
  publication-title: 2016 IEEE Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE
– volume: 36
  start-page: 93
  year: 2009
  end-page: 100
  ident: bib0003
  article-title: Wool-a work stealing library
  publication-title: SIGARCH Comput. Archit. News
– start-page: 315
  year: 2013
  end-page: 325
  ident: bib0038
  article-title: Locality-aware Task Management for Unstructured Parallelism: A Quantitative Limit Study
  publication-title: in: Proceedings of the Twenty-fifth Annual ACM Symposium on Parallelism in Algorithms and Architectures
– start-page: 719
  year: 2014
  end-page: 730
  ident: bib0026
  article-title: Fault-tolerant Dynamic Task Graph Scheduling
  publication-title: in: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’14
– start-page: 5
  year: 2015
  ident: bib0035
  article-title: Locality-aware task scheduling and data distribution for OpenMP programs on NUMA systems and manycore processors
  publication-title: Sci. Program
– reference: A. Zafari, E. Larsson, Distributed dynamic load balancing for task parallel programming, 2018, 13,
– volume: 30
  start-page: 438
  year: 2016
  end-page: 453
  ident: bib0046
  article-title: Fast event-based epidemiological simulations on national scales
  publication-title: Int. J. High Perform. Comput. Appl.
– reference: .
– start-page: 298
  year: 2012
  end-page: 299
  ident: bib0011
  article-title: StarPU-MPI: task programming over clusters of machines enhanced with accelerators
  publication-title: Recent Advances in the Message Passing Interface
– volume: 21
  start-page: 173
  year: 2011
  end-page: 193
  ident: bib0008
  article-title: OmpSs: a proposal for programming heterogeneous multi-core architectures
  publication-title: Parallel Proc. Lett.
– start-page: 253
  year: 2013
  end-page: 262
  ident: bib0040
  article-title: Towards more efficient execution: A decoupled access-execute approach
  publication-title: in: Proceedings of the International Conference on Supercomputing
– start-page: 5
  year: 2015
  ident: 10.1016/j.parco.2019.102582_bib0035
  article-title: Locality-aware task scheduling and data distribution for OpenMP programs on NUMA systems and manycore processors
  publication-title: Sci. Program
– start-page: 298
  year: 2012
  ident: 10.1016/j.parco.2019.102582_bib0011
  article-title: StarPU-MPI: task programming over clusters of machines enhanced with accelerators
– year: 2005
  ident: 10.1016/j.parco.2019.102582_bib0022
– volume: 23
  start-page: 187
  issue: 2
  year: 2011
  ident: 10.1016/j.parco.2019.102582_bib0009
  article-title: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
  publication-title: Concurr. Comput.: Pract. Exper.
  doi: 10.1002/cpe.1631
– start-page: 11
  year: 2017
  ident: 10.1016/j.parco.2019.102582_bib0032
  article-title: Taskinsight: Understanding Task Schedules Effects on Memory and Performance
– volume: 37
  start-page: 55
  issue: 1
  year: 1996
  ident: 10.1016/j.parco.2019.102582_bib0002
  article-title: Cilk: an efficient multithreaded runtime system
  publication-title: J. Parallel Distrib. Comput.
  doi: 10.1006/jpdc.1996.0107
– ident: 10.1016/j.parco.2019.102582_bib0027
  doi: 10.1109/CLUSTER.2014.6968739
– start-page: 531
  year: 2016
  ident: 10.1016/j.parco.2019.102582_bib0006
  article-title: Using Data Dependencies to Improve Task-based Scheduling Strategies on NUMA Architectures
– volume: 36
  start-page: C376
  issue: 4
  year: 2014
  ident: 10.1016/j.parco.2019.102582_bib0044
  article-title: Dynamic autotuning of adaptive fast multipole methods on hybrid multicore CPU and GPU systems
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/130943595
– start-page: 719
  year: 2014
  ident: 10.1016/j.parco.2019.102582_bib0026
  article-title: Fault-tolerant Dynamic Task Graph Scheduling
– ident: 10.1016/j.parco.2019.102582_bib0007
– ident: 10.1016/j.parco.2019.102582_bib0023
– start-page: 74
  year: 2016
  ident: 10.1016/j.parco.2019.102582_bib0024
  article-title: A C++ Generic Parallel Pattern Interface for Stream Processing
– year: 2016
  ident: 10.1016/j.parco.2019.102582_bib0030
– volume: 465
  start-page: 1949
  issue: 2106
  year: 2009
  ident: 10.1016/j.parco.2019.102582_bib0043
  article-title: A radial basis function method for the shallow water equations on a sphere
  publication-title: Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci
– start-page: 318
  year: 2016
  ident: 10.1016/j.parco.2019.102582_bib0012
  article-title: Controlling the memory subscription of distributed applications with a task-based runtime system
– volume: 1
  start-page: 1
  issue: 2
  year: 2015
  ident: 10.1016/j.parco.2019.102582_bib0028
  article-title: Algorithm-based fault tolerance for dense matrix factorizations multiple failures and accuracy,
  publication-title: ACM Trans. Parallel Comput.
  doi: 10.1145/2686892
– volume: 298
  start-page: 406
  year: 2015
  ident: 10.1016/j.parco.2019.102582_bib0045
  article-title: A scalable RBF-FD method for atmospheric flow
  publication-title: J. Comput. Phys
  doi: 10.1016/j.jcp.2015.06.003
– start-page: 6
  year: 2015
  ident: 10.1016/j.parco.2019.102582_bib0021
  article-title: An adaptive load-balancer for task-scheduling in FastFlow
– start-page: 1
  year: 2015
  ident: 10.1016/j.parco.2019.102582_bib0036
  article-title: Runtime-driven shared last-level cache management for task-parallel programs
– volume: 21
  start-page: 173
  issue: 2
  year: 2011
  ident: 10.1016/j.parco.2019.102582_bib0008
  article-title: OmpSs: a proposal for programming heterogeneous multi-core architectures
  publication-title: Parallel Proc. Lett.
  doi: 10.1142/S0129626411000151
– year: 2017
  ident: 10.1016/j.parco.2019.102582_bib0020
– start-page: 169
  year: 2018
  ident: 10.1016/j.parco.2019.102582_bib0025
  article-title: TaskUniVerse: A Task-based Unified Interface for Versatile Parallel Execution
  doi: 10.1007/978-3-319-78024-5_16
– start-page: 765
  year: 2015
  ident: 10.1016/j.parco.2019.102582_bib0029
  article-title: Design for a soft error resilient dynamic task-based runtime
– start-page: 267
  year: 2011
  ident: 10.1016/j.parco.2019.102582_bib0013
  article-title: ClusterSs: A Task-based Programming Model for Clusters
– start-page: 315
  year: 2013
  ident: 10.1016/j.parco.2019.102582_bib0038
  article-title: Locality-aware Task Management for Unstructured Parallelism: A Quantitative Limit Study
– volume: 30
  start-page: 438
  issue: 4
  year: 2016
  ident: 10.1016/j.parco.2019.102582_bib0046
  article-title: Fast event-based epidemiological simulations on national scales
  publication-title: Int. J. High Perform. Comput. Appl.
  doi: 10.1177/1094342016635723
– start-page: 180
  year: 2010
  ident: 10.1016/j.parco.2019.102582_bib0041
  article-title: hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications
– volume: 11
  start-page: 1
  issue: 3
  year: 2014
  ident: 10.1016/j.parco.2019.102582_sbref0030
  article-title: Topology-aware and dependence-aware scheduling and memory allocation for task-parallel languages
  publication-title: ACM Trans. Archit. Code Optim
  doi: 10.1145/2641764
– start-page: 1
  year: 2015
  ident: 10.1016/j.parco.2019.102582_bib0033
  article-title: StatTask: reuse distance analysis for task-based applications
– volume: 15
  start-page: 36
  issue: 6
  year: 2013
  ident: 10.1016/j.parco.2019.102582_bib0014
  article-title: PaRSEC: exploiting heterogeneity to enhance scalability
  publication-title: Comput. Sci. Eng.
  doi: 10.1109/MCSE.2013.98
– volume: 37
  start-page: C617
  issue: 6
  year: 2015
  ident: 10.1016/j.parco.2019.102582_bib0010
  article-title: SuperGlue: a shared memory framework using data versioning for dependency-aware task-based parallelization
  publication-title: SIAM J. Sci. Comput
  doi: 10.1137/140989716
– start-page: 253
  year: 2013
  ident: 10.1016/j.parco.2019.102582_bib0040
  article-title: Towards more efficient execution: A decoupled access-execute approach
– volume: 36
  start-page: 93
  issue: 5
  year: 2009
  ident: 10.1016/j.parco.2019.102582_bib0003
  article-title: Wool-a work stealing library
  publication-title: SIGARCH Comput. Archit. News
  doi: 10.1145/1556444.1556457
– ident: 10.1016/j.parco.2019.102582_bib0005
– volume: 2
  start-page: 67
  issue: 4
  year: 2016
  ident: 10.1016/j.parco.2019.102582_bib0001
  article-title: Parallel programming models for dense linear algebra on heterogeneous systems
  publication-title: Supercomput. Front. Innov
– volume: 14
  start-page: 1
  issue: 1
  year: 2015
  ident: 10.1016/j.parco.2019.102582_bib0031
  article-title: Resource-aware task scheduling
  publication-title: ACM Trans. Embedded Comput. Syst.
  doi: 10.1145/2638554
– start-page: 256
  year: 2011
  ident: 10.1016/j.parco.2019.102582_bib0016
– volume: 5
  start-page: 173
  issue: 3
  year: 1996
  ident: 10.1016/j.parco.2019.102582_bib0042
  article-title: Design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines
  publication-title: Sci. Program.
– volume: 2014
  start-page: 73
  year: 2013
  ident: 10.1016/j.parco.2019.102582_bib0037
  article-title: Analysis of Data Reuse in Task-parallel Runtimes
– year: 2014
  ident: 10.1016/j.parco.2019.102582_bib0039
  article-title: Fix the Code. Don’t Tweak the Hardware: A New Compiler Approach to Voltage-frequency Scaling
– volume: 38
  start-page: 37
  issue: 1–2
  year: 2012
  ident: 10.1016/j.parco.2019.102582_bib0015
  article-title: DAGuE: a generic distributed DAG engine for high performance computing
  publication-title: Parallel Comput.
  doi: 10.1016/j.parco.2011.10.003
– start-page: 1
  year: 2011
  ident: 10.1016/j.parco.2019.102582_bib0004
  article-title: A unified scheduler for recursive and task dataflow parallelism
– volume: 40
  start-page: 328
  issue: 7
  year: 2014
  ident: 10.1016/j.parco.2019.102582_bib0018
  article-title: Chunks and tasks: a programming model for parallelization of dynamic algorithms
  publication-title: Parallel Comput.
  doi: 10.1016/j.parco.2013.09.006
– start-page: 275
  year: 2012
  ident: 10.1016/j.parco.2019.102582_bib0019
  article-title: Programming models based on data versioning for dependency-aware task-based parallelisation
– start-page: 91
  year: 1993
  ident: 10.1016/j.parco.2019.102582_bib0017
  article-title: CHARM++: A portable concurrent object oriented system based on C++
– year: 1999
  ident: 10.1016/j.parco.2019.102582_bib0047
SSID ssj0006480
Score 2.295676
Snippet •We introduce a hierarchical task parallel programming model for distributed memory systems.•We show that the new model provides both flexibility and...
SourceID swepub
crossref
elsevier
SourceType Open Access Repository
Enrichment Source
Index Database
Publisher
StartPage 102582
SubjectTerms Data versioning
Distributed memory system
Hierarchical decomposition
High performance computing
Scientific computing
Task-based parallel programming
Title DuctTeip: An efficient programming model for distributed task-based parallel computing
URI https://dx.doi.org/10.1016/j.parco.2019.102582
https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-338832
Volume 90
WOSCitedRecordID wos000501649400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-7336
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0006480
  issn: 1872-7336
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Pb9MwFLZg48AFxi8xfkw-wAkyLXES29wi1gkQmnYoVW9W4jioW8iqNkH783nPdtKwQcUOXKIojR237-vz88vn7xHyxkjNC87yAIPpIE5zHkhRJIERGpqwWBr7on32lZ-eivlcnnlC5tqWE-BNI66u5PK_mhqugbFx6-wtzD10ChfgHIwORzA7HP_J8MedbqdmsfQpP2M1IvCNv6di_cDkgC2AYymGJSrnYtEriDzbfH0R4LxWvkNJ8Lo2teWcd20_wfkw9uzPn2ICOq9yt3k9qzC7NRB-YAHtN3dNatT1dUwyTwHGei6Lbu03D_Vq4D4ZEcoRscPnJ8Hv4hpw7GBdPVDvISGgSVy5oRvO2-URzg_hG2rclxnKw83dv0tlX5vCBmJhz1k7V7YThZ0o18ldshvxRILz3s0-T-Zfhvk6jW19vWHovTaVZQHeGMtf45ex0KwNTqZ75IFfVdDMoeERuWOax-RhX7GDegf-hMx6cHygWUMHaNARNKiFBgVo0BE06AYatIcGHYz_lHw7mUw_fgp8ZY1As_ioDXgVVnCiJeoFJrlkFS8jluaaMcONDrmBVX5sTKyFiFFUMNFRVFVRkpujsogNe0Z2msvGPCe0FFpXYRmJPBVxUrICIuCkEKyEQB-13PZJ1P9eSnvZeax-Uqsttton74dGS6e6sv32tDeE8oGjCwgVQGt7w7fObMNTUG39eDHL1OXqu-o6xZiAOe_F7cbzktzf_DlekZ121ZnX5J7-2S7WqwOPv1_My58f
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DuctTeip%3A+An+efficient+programming+model+for+distributed+task-based+parallel+computing&rft.jtitle=Parallel+computing&rft.au=Zafari%2C+Afshin&rft.au=Larsson%2C+Elisabeth&rft.au=Tillenius%2C+Martin&rft.date=2019-12-01&rft.issn=0167-8191&rft.volume=90&rft.spage=102582&rft_id=info:doi/10.1016%2Fj.parco.2019.102582&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_parco_2019_102582
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-8191&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-8191&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-8191&client=summon