DuctTeip: An efficient programming model for distributed task-based parallel computing

•We introduce a hierarchical task parallel programming model for distributed memory systems.•We show that the new model provides both flexibility and performance.•We use the model to implement a Cholesky factorization and a solver for the shallow water equations.•We have compared our implementation...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Parallel computing Jg. 90; S. 102582
Hauptverfasser:	Zafari, Afshin, Larsson, Elisabeth, Tillenius, Martin
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier B.V 01.12.2019
Schlagworte:	Data versioning Distributed memory system Hierarchical decomposition High performance computing Scientific computing Task-based parallel programming Hierarchical decomposition Distributed memory system Scientific computing High performance computing Task-based parallel programming 65Y10 65Y05 68Q10 Data versioning
ISSN:	0167-8191, 1872-7336, 1872-7336
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	•We introduce a hierarchical task parallel programming model for distributed memory systems.•We show that the new model provides both flexibility and performance.•We use the model to implement a Cholesky factorization and a solver for the shallow water equations.•We have compared our implementation with other frameworks and shown that it is competitive. Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored approaches. Task-based parallel programming has been successful both in simplifying the programming and in exploiting the available hardware parallelism for shared memory systems. In this paper we focus on how to extend task-parallel programming to distributed memory systems. We use a hierarchical decomposition of tasks and data in order to accommodate the different levels of hardware. We test the proposed programming model on two different applications, a Cholesky factorization, and a solver for the Shallow Water Equations. We also compare the performance of our implementation with that of other frameworks for distributed task-parallel programming, and show that it is competitive.
AbstractList	•We introduce a hierarchical task parallel programming model for distributed memory systems.•We show that the new model provides both flexibility and performance.•We use the model to implement a Cholesky factorization and a solver for the shallow water equations.•We have compared our implementation with other frameworks and shown that it is competitive. Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored approaches. Task-based parallel programming has been successful both in simplifying the programming and in exploiting the available hardware parallelism for shared memory systems. In this paper we focus on how to extend task-parallel programming to distributed memory systems. We use a hierarchical decomposition of tasks and data in order to accommodate the different levels of hardware. We test the proposed programming model on two different applications, a Cholesky factorization, and a solver for the Shallow Water Equations. We also compare the performance of our implementation with that of other frameworks for distributed task-parallel programming, and show that it is competitive.
ArticleNumber	102582
Author	Zafari, Afshin Larsson, Elisabeth Tillenius, Martin
Author_xml	– sequence: 1 givenname: Afshin surname: Zafari fullname: Zafari, Afshin – sequence: 2 givenname: Elisabeth orcidid: 0000-0003-1154-9587 surname: Larsson fullname: Larsson, Elisabeth email: elisabeth.larsson@it.uu.se – sequence: 3 givenname: Martin surname: Tillenius fullname: Tillenius, Martin
BackLink	https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-338832$$DView record from Swedish Publication Index (Uppsala universitet)
BookMark	eNqFkF1LwzAUQINMcJv-Al_6A-zMR9u0gg9j8wsGvsy9hiy9GZltU5JU8d-bOfHBB33KJZxzL5wJGnW2A4QuCZ4RTIrr_ayXTtkZxaSKPzQv6Qkak5LTlDNWjNA4UjwtSUXO0MT7Pca4yEo8RpvloMIaTH-TzLsEtDbKQBeS3tmdk21rul3S2hqaRFuX1MYHZ7ZDgDoJ0r-mW-njGI_LpomMsm0_hOico1MtGw8X3-8UvdzfrReP6er54WkxX6WKZTikXBMdB1URjLNcVkzzmrJCKsaAgyIcOOEZQKbKMiMRyhWlWtNcAq63GbApujru9e_QD1vRO9NK9yGsNGJpNnNh3U4Mg2CsLBmNODviylnvHegfgWBxKCn24qukOJQUx5LRqn5ZygQZjO2Ck6b5x709uhArvBlwwh8CK6iNAxVEbc2f_icXrpNf
CitedBy_id	crossref_primary_10_1007_s11128_023_04155_2 crossref_primary_10_1007_s00607_023_01190_w crossref_primary_10_7717_peerj_cs_2966 crossref_primary_10_1007_s00521_022_07559_w crossref_primary_10_1155_2021_6639008 crossref_primary_10_1016_j_bspc_2024_106358 crossref_primary_10_1016_j_parco_2023_103052 crossref_primary_10_1016_j_inffus_2021_02_008
Cites_doi	10.1002/cpe.1631 10.1006/jpdc.1996.0107 10.1109/CLUSTER.2014.6968739 10.1137/130943595 10.1145/2686892 10.1016/j.jcp.2015.06.003 10.1142/S0129626411000151 10.1007/978-3-319-78024-5_16 10.1177/1094342016635723 10.1145/2641764 10.1109/MCSE.2013.98 10.1137/140989716 10.1145/1556444.1556457 10.1145/2638554 10.1016/j.parco.2011.10.003 10.1016/j.parco.2013.09.006
ContentType	Journal Article
Copyright	2019 Elsevier B.V.
Copyright_xml	– notice: 2019 Elsevier B.V.
DBID	AAYXX CITATION ACNBI ADTPV AOWAS D8T DF2 ZZAVC
DOI	10.1016/j.parco.2019.102582
DatabaseName	CrossRef SWEPUB Uppsala universitet full text SwePub SwePub Articles SWEPUB Freely available online SWEPUB Uppsala universitet SwePub Articles full text
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1872-7336
ExternalDocumentID	oai_DiVA_org_uu_338832 10_1016_j_parco_2019_102582 S0167819119301735
GrantInformation_xml	– fundername: Swedish Research Council funderid: https://doi.org/10.13039/501100004359
GroupedDBID	--K --M -~X .DC .~1 0R~ 123 1B1 1~. 1~5 29O 4.4 457 4G. 5VS 6OB 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABEFU ABFNM ABJNI ABMAC ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM LG9 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SCC SDF SDG SDP SES SEW SPC SPCBC SST SSV SSZ T5K WH7 WUQ XPP ZMT ~G- 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD ACNBI ADTPV AOWAS D8T DF2 ZZAVC
ID	FETCH-LOGICAL-c340t-7f1f340c910045a93f7d236ac33e7ec17e7174ee4c88411005c22ff25ae0db4e3
ISICitedReferencesCount	11
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000501649400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	0167-8191 1872-7336
IngestDate	Tue Nov 04 16:55:57 EST 2025 Tue Nov 18 21:40:55 EST 2025 Sat Nov 29 07:24:40 EST 2025 Fri Feb 23 02:29:26 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	Hierarchical decomposition Distributed memory system Scientific computing High performance computing Task-based parallel programming 65Y10 65Y05 68Q10 Data versioning
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c340t-7f1f340c910045a93f7d236ac33e7ec17e7174ee4c88411005c22ff25ae0db4e3
ORCID	0000-0003-1154-9587
OpenAccessLink	https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-338832
ParticipantIDs	swepub_primary_oai_DiVA_org_uu_338832 crossref_primary_10_1016_j_parco_2019_102582 crossref_citationtrail_10_1016_j_parco_2019_102582 elsevier_sciencedirect_doi_10_1016_j_parco_2019_102582
PublicationCentury	2000
PublicationDate	2019-12-01
PublicationDateYYYYMMDD	2019-12-01
PublicationDate_xml	– month: 12 year: 2019 text: 2019-12-01 day: 01
PublicationDecade	2010
PublicationTitle	Parallel computing
PublicationYear	2019
Publisher	Elsevier B.V
Publisher_xml	– name: Elsevier B.V
References	Pan, Pai (bib0036) 2015 Augonnet, Aumage, Furmento, Namyst, Thibault (bib0011) 2012 Gasulla (bib0020) 2017 Jimborean, Koukos, Spiliopoulos, Black, Kaxiras (bib0039) 2014 TBB, 2016,. Intel threading building blocks documentation Anderson, Bai, DuCroz, Bai, Bischof, Blackford, Demmel, Dongarra, Du, Hammarling, Greenbaum, Mc, Sorensen (bib0047) 1999 Sergent, Goudin, Thibault, Aumage (bib0012) 2016 Intel OpenMP, Architecture review board, 2013, OpenMP 4.0 Complete specifications Vandierendonck, Tzenakis, Nikolopoulos (bib0004) 2011 Tillenius, Larsson, Badia, Martorell (bib0031) 2015; 14 Broquedis, Clet, Moreaud, Furmento, Goglin, Mercier, Thibault, Namyst (bib0041) 2010 Bosilca, Bouteiller, Danalis, Hérault, Lemarinier, Dongarra (bib0015) 2012; 38 Koukos, Schaffer, Spiliopoulos, Kaxiras (bib0040) 2013 Muddukrishna, Jonsson, Brorsson (bib0035) 2015 Rubensson, Rudberg (bib0018) 2014; 40 Moniruzzaman, Idrees, Rossbory, Gracia (bib0021) 2015 Kurt, Krishnamoorthy, Agrawal, Agrawal (bib0026) 2014 Cao, Herault, Bosilca, Dongarra (bib0029) 2015 Zheng (bib0022) 2005 Ni (bib0030) 2016 Drebes, Heydemann, Drach, Pop, Cohen (bib0034) 2014; 11 Holm, Engblom, Goude, Holmgren (bib0044) 2014; 36 Faxén (bib0003) 2009; 36 Augonnet, Thibault, Namyst, Wacrenier (bib0009) 2011; 23 Bouteiller, Herault, Bosilca, Du, Dongarra (bib0028) 2015; 1 Pericàs, Amer, Taura, Matsuoka (bib0037) 2013; 2014 . Ceballos, Grass, Hugo, Black-Schaffer (bib0032) 2017 Choi, Dongarra, Ostrouchov, Petitet, Walker (bib0042) 1996; 5 Virouleau, Broquedis, Gautier, Rastello (bib0006) 2016 Kalé, Krishnan (bib0017) 1993 Zafari, Tillenius, Larsson (bib0019) 2012 July. Tejedor, Farreras, Grove, Badia, Almasi, Labarta (bib0013) 2011 Bosilca, Bouteiller, Danalis, Faverge, Herault, Dongarra (bib0014) 2013; 15 Dongarra, Abalenkovs, Abdelfattah, Gates, Haidar, Kurzak, Luszczek, Tomov, Yamazaki, YarKhan (bib0001) 2016; 2 June. Dolz, Sanchez, Blas, García, Carretero, Garcia-Blas, Ko, Mueller, Nakano (bib0024) 2016 Duran, Ayguadé, Badia, Labarta, Martinell, Martorell, Planas (bib0008) 2011; 21 Tillenius (bib0010) 2015; 37 Ceballos, Hagersten, Black-Schaffer (bib0033) 2015 Flyer, Wright (bib0043) 2009; 465 Tillenius, Larsson, Lehto, Flyer (bib0045) 2015; 298 Kalé (bib0016) 2011 Zafari (bib0025) 2018 Bauer, Engblom, Widgren (bib0046) 2016; 30 J. Lifflander, E. Meneses, H. Menon, P. Miller, S. Krishnamoorthy, L. Kale, Scalable replay with partial-order dependencies for message-logging fault tolerance, in: Proceedings of IEEE cluster, IEEE,, 2014, 19–28 Yoo, Hughes, Kim, Chen, Kozyrakis (bib0038) 2013 Blumofe, Joerg, Kuszmaul, Leiserson, Randall, Zhou (bib0002) 1996; 37 A. Zafari, E. Larsson, Distributed dynamic load balancing for task parallel programming, 2018, 13 Blumofe (10.1016/j.parco.2019.102582_bib0002) 1996; 37 Kalé (10.1016/j.parco.2019.102582_bib0017) 1993 Tejedor (10.1016/j.parco.2019.102582_bib0013) 2011 Moniruzzaman (10.1016/j.parco.2019.102582_bib0021) 2015 Vandierendonck (10.1016/j.parco.2019.102582_bib0004) 2011 Kalé (10.1016/j.parco.2019.102582_bib0016) 2011 Jimborean (10.1016/j.parco.2019.102582_bib0039) 2014 Yoo (10.1016/j.parco.2019.102582_bib0038) 2013 Pericàs (10.1016/j.parco.2019.102582_bib0037) 2013; 2014 Kurt (10.1016/j.parco.2019.102582_bib0026) 2014 Zafari (10.1016/j.parco.2019.102582_bib0025) 2018 Gasulla (10.1016/j.parco.2019.102582_bib0020) 2017 Rubensson (10.1016/j.parco.2019.102582_bib0018) 2014; 40 Anderson (10.1016/j.parco.2019.102582_bib0047) 1999 Bauer (10.1016/j.parco.2019.102582_bib0046) 2016; 30 Cao (10.1016/j.parco.2019.102582_bib0029) 2015 Dolz (10.1016/j.parco.2019.102582_bib0024) 2016 Bosilca (10.1016/j.parco.2019.102582_bib0015) 2012; 38 Bouteiller (10.1016/j.parco.2019.102582_bib0028) 2015; 1 Muddukrishna (10.1016/j.parco.2019.102582_bib0035) 2015 Augonnet (10.1016/j.parco.2019.102582_bib0009) 2011; 23 Ni (10.1016/j.parco.2019.102582_bib0030) 2016 Tillenius (10.1016/j.parco.2019.102582_bib0010) 2015; 37 Augonnet (10.1016/j.parco.2019.102582_bib0011) 2012 Bosilca (10.1016/j.parco.2019.102582_bib0014) 2013; 15 Duran (10.1016/j.parco.2019.102582_bib0008) 2011; 21 Zheng (10.1016/j.parco.2019.102582_bib0022) 2005 10.1016/j.parco.2019.102582_bib0007 Ceballos (10.1016/j.parco.2019.102582_bib0033) 2015 Koukos (10.1016/j.parco.2019.102582_bib0040) 2013 Choi (10.1016/j.parco.2019.102582_bib0042) 1996; 5 Holm (10.1016/j.parco.2019.102582_bib0044) 2014; 36 10.1016/j.parco.2019.102582_bib0005 10.1016/j.parco.2019.102582_bib0027 Drebes (10.1016/j.parco.2019.102582_sbref0030) 2014; 11 Dongarra (10.1016/j.parco.2019.102582_bib0001) 2016; 2 10.1016/j.parco.2019.102582_bib0023 Tillenius (10.1016/j.parco.2019.102582_bib0045) 2015; 298 Ceballos (10.1016/j.parco.2019.102582_bib0032) 2017 Virouleau (10.1016/j.parco.2019.102582_bib0006) 2016 Faxén (10.1016/j.parco.2019.102582_bib0003) 2009; 36 Pan (10.1016/j.parco.2019.102582_bib0036) 2015 Sergent (10.1016/j.parco.2019.102582_bib0012) 2016 Flyer (10.1016/j.parco.2019.102582_bib0043) 2009; 465 Broquedis (10.1016/j.parco.2019.102582_bib0041) 2010 Zafari (10.1016/j.parco.2019.102582_bib0019) 2012 Tillenius (10.1016/j.parco.2019.102582_bib0031) 2015; 14
References_xml	– start-page: 6 year: 2015 end-page: 12 ident: bib0021 article-title: An adaptive load-balancer for task-scheduling in FastFlow publication-title: in: Proceedings of The Fifth International Conference on Advanced Communications and Computation (INFOCOMP IARIA), – start-page: 765 year: 2015 end-page: 774 ident: bib0029 article-title: Design for a soft error resilient dynamic task-based runtime publication-title: IEEE International Parallel and Distributed Processing Symposium – volume: 1 start-page: 1 year: 2015 end-page: 10 ident: bib0028 article-title: Algorithm-based fault tolerance for dense matrix factorizations multiple failures and accuracy, publication-title: ACM Trans. Parallel Comput. – start-page: 256 year: 2011 end-page: 264 ident: bib0016 article-title: Charm++, in: Encyclopedia of Parallel Computing – year: 2017 ident: bib0020 publication-title: Dynamic load balancing for hybrid applications – start-page: 1 year: 2015 end-page: 7. ident: bib0033 article-title: StatTask: reuse distance analysis for task-based applications publication-title: in: Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools – start-page: 74 year: 2016 end-page: 87 ident: bib0024 article-title: A C++ Generic Parallel Pattern Interface for Stream Processing publication-title: Algorithms and Architectures for Parallel Processing – start-page: 1 year: 2015 end-page: 12 ident: bib0036 article-title: Runtime-driven shared last-level cache management for task-parallel programs publication-title: in: SC15: International Conference for High Performance Computing, Networking, Storage and Analysis – volume: 465 start-page: 1949 year: 2009 end-page: 1976 ident: bib0043 article-title: A radial basis function method for the shallow water equations on a sphere publication-title: Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci – volume: 298 start-page: 406 year: 2015 end-page: 422 ident: bib0045 article-title: A scalable RBF-FD method for atmospheric flow publication-title: J. Comput. Phys – volume: 37 start-page: C617 year: 2015 end-page: C642 ident: bib0010 article-title: SuperGlue: a shared memory framework using data versioning for dependency-aware task-based parallelization publication-title: SIAM J. Sci. Comput – volume: 5 start-page: 173 year: 1996 end-page: 184 ident: bib0042 article-title: Design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines publication-title: Sci. Program. – volume: 36 start-page: C376 year: 2014 end-page: C399 ident: bib0044 article-title: Dynamic autotuning of adaptive fast multipole methods on hybrid multicore CPU and GPU systems publication-title: SIAM J. Sci. Comput. – volume: 14 start-page: 1 year: 2015 end-page: 5 ident: bib0031 article-title: Resource-aware task scheduling publication-title: ACM Trans. Embedded Comput. Syst. – start-page: 169 year: 2018 end-page: 184 ident: bib0025 article-title: TaskUniVerse: A Task-based Unified Interface for Versatile Parallel Execution publication-title: 10777 of Lecture Notes in Computer Science – start-page: 531 year: 2016 end-page: 544 ident: bib0006 article-title: Using Data Dependencies to Improve Task-based Scheduling Strategies on NUMA Architectures publication-title: Proceedings of Euro-Par 2016: Parallel Processing: 22nd International Conference on Parallel and Distributed Computing – volume: 15 start-page: 36 year: 2013 end-page: 45 ident: bib0014 article-title: PaRSEC: exploiting heterogeneity to enhance scalability publication-title: Comput. Sci. Eng. – reference: OpenMP, Architecture review board, 2013, OpenMP 4.0 Complete specifications, – volume: 11 start-page: 1 year: 2014 end-page: 30 ident: bib0034 article-title: Topology-aware and dependence-aware scheduling and memory allocation for task-parallel languages publication-title: ACM Trans. Archit. Code Optim – reference: TBB, 2016,. Intel threading building blocks documentation, – volume: 37 start-page: 55 year: 1996 end-page: 69 ident: bib0002 article-title: Cilk: an efficient multithreaded runtime system publication-title: J. Parallel Distrib. Comput. – year: 2014 ident: bib0039 article-title: Fix the Code. Don’t Tweak the Hardware: A New Compiler Approach to Voltage-frequency Scaling publication-title: in: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization – year: 2005 ident: bib0022 publication-title: Achieving high performance on extremely large parallel machines: performance prediction and load balancing – reference: J. Lifflander, E. Meneses, H. Menon, P. Miller, S. Krishnamoorthy, L. Kale, Scalable replay with partial-order dependencies for message-logging fault tolerance, in: Proceedings of IEEE cluster, IEEE,, 2014, 19–28, – start-page: 11 year: 2017 end-page: 20 ident: bib0032 article-title: Taskinsight: Understanding Task Schedules Effects on Memory and Performance publication-title: in: Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores – volume: 2014 start-page: 73 year: 2013 end-page: 87 ident: bib0037 article-title: Analysis of Data Reuse in Task-parallel Runtimes publication-title: High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation. PMBS – start-page: 180 year: 2010 end-page: 186 ident: bib0041 article-title: hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications publication-title: in: Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP 2010 IEEE, – start-page: 1 year: 2011 end-page: 11 ident: bib0004 article-title: A unified scheduler for recursive and task dataflow parallelism publication-title: in: Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT ’11) – reference: June. – reference: Intel, – start-page: 267 year: 2011 end-page: 268 ident: bib0013 article-title: ClusterSs: A Task-based Programming Model for Clusters publication-title: in: Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing (HPDC2011 – start-page: 91 year: 1993 end-page: 108 ident: bib0017 article-title: CHARM++: A portable concurrent object oriented system based on C++ publication-title: in: Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA) – volume: 2 start-page: 67 year: 2016 end-page: 86 ident: bib0001 article-title: Parallel programming models for dense linear algebra on heterogeneous systems publication-title: Supercomput. Front. Innov – start-page: 275 year: 2012 end-page: 280 ident: bib0019 article-title: Programming models based on data versioning for dependency-aware task-based parallelisation publication-title: in: Proceedings of the 15th IEEE International Conference on Computational Science and Engineering CSE – volume: 38 start-page: 37 year: 2012 end-page: 51 ident: bib0015 article-title: DAGuE: a generic distributed DAG engine for high performance computing publication-title: Parallel Comput. – volume: 40 start-page: 328 year: 2014 end-page: 343 ident: bib0018 article-title: Chunks and tasks: a programming model for parallelization of dynamic algorithms publication-title: Parallel Comput. – year: 1999 ident: bib0047 article-title: LAPACK Users’ Guide (Third Ed.), Society for Industrial and Applied Mathematics – year: 2016 ident: bib0030 publication-title: Mitigation of failures in high performance computing via runtime techniques – reference: July. – volume: 23 start-page: 187 year: 2011 end-page: 198 ident: bib0009 article-title: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures publication-title: Concurr. Comput.: Pract. Exper. – start-page: 318 year: 2016 end-page: 327 ident: bib0012 article-title: Controlling the memory subscription of distributed applications with a task-based runtime system publication-title: 2016 IEEE Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE – volume: 36 start-page: 93 year: 2009 end-page: 100 ident: bib0003 article-title: Wool-a work stealing library publication-title: SIGARCH Comput. Archit. News – start-page: 315 year: 2013 end-page: 325 ident: bib0038 article-title: Locality-aware Task Management for Unstructured Parallelism: A Quantitative Limit Study publication-title: in: Proceedings of the Twenty-fifth Annual ACM Symposium on Parallelism in Algorithms and Architectures – start-page: 719 year: 2014 end-page: 730 ident: bib0026 article-title: Fault-tolerant Dynamic Task Graph Scheduling publication-title: in: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’14 – start-page: 5 year: 2015 ident: bib0035 article-title: Locality-aware task scheduling and data distribution for OpenMP programs on NUMA systems and manycore processors publication-title: Sci. Program – reference: A. Zafari, E. Larsson, Distributed dynamic load balancing for task parallel programming, 2018, 13, – volume: 30 start-page: 438 year: 2016 end-page: 453 ident: bib0046 article-title: Fast event-based epidemiological simulations on national scales publication-title: Int. J. High Perform. Comput. Appl. – reference: . – start-page: 298 year: 2012 end-page: 299 ident: bib0011 article-title: StarPU-MPI: task programming over clusters of machines enhanced with accelerators publication-title: Recent Advances in the Message Passing Interface – volume: 21 start-page: 173 year: 2011 end-page: 193 ident: bib0008 article-title: OmpSs: a proposal for programming heterogeneous multi-core architectures publication-title: Parallel Proc. Lett. – start-page: 253 year: 2013 end-page: 262 ident: bib0040 article-title: Towards more efficient execution: A decoupled access-execute approach publication-title: in: Proceedings of the International Conference on Supercomputing – start-page: 5 year: 2015 ident: 10.1016/j.parco.2019.102582_bib0035 article-title: Locality-aware task scheduling and data distribution for OpenMP programs on NUMA systems and manycore processors publication-title: Sci. Program – start-page: 298 year: 2012 ident: 10.1016/j.parco.2019.102582_bib0011 article-title: StarPU-MPI: task programming over clusters of machines enhanced with accelerators – year: 2005 ident: 10.1016/j.parco.2019.102582_bib0022 – volume: 23 start-page: 187 issue: 2 year: 2011 ident: 10.1016/j.parco.2019.102582_bib0009 article-title: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures publication-title: Concurr. Comput.: Pract. Exper. doi: 10.1002/cpe.1631 – start-page: 11 year: 2017 ident: 10.1016/j.parco.2019.102582_bib0032 article-title: Taskinsight: Understanding Task Schedules Effects on Memory and Performance – volume: 37 start-page: 55 issue: 1 year: 1996 ident: 10.1016/j.parco.2019.102582_bib0002 article-title: Cilk: an efficient multithreaded runtime system publication-title: J. Parallel Distrib. Comput. doi: 10.1006/jpdc.1996.0107 – ident: 10.1016/j.parco.2019.102582_bib0027 doi: 10.1109/CLUSTER.2014.6968739 – start-page: 531 year: 2016 ident: 10.1016/j.parco.2019.102582_bib0006 article-title: Using Data Dependencies to Improve Task-based Scheduling Strategies on NUMA Architectures – volume: 36 start-page: C376 issue: 4 year: 2014 ident: 10.1016/j.parco.2019.102582_bib0044 article-title: Dynamic autotuning of adaptive fast multipole methods on hybrid multicore CPU and GPU systems publication-title: SIAM J. Sci. Comput. doi: 10.1137/130943595 – start-page: 719 year: 2014 ident: 10.1016/j.parco.2019.102582_bib0026 article-title: Fault-tolerant Dynamic Task Graph Scheduling – ident: 10.1016/j.parco.2019.102582_bib0007 – ident: 10.1016/j.parco.2019.102582_bib0023 – start-page: 74 year: 2016 ident: 10.1016/j.parco.2019.102582_bib0024 article-title: A C++ Generic Parallel Pattern Interface for Stream Processing – year: 2016 ident: 10.1016/j.parco.2019.102582_bib0030 – volume: 465 start-page: 1949 issue: 2106 year: 2009 ident: 10.1016/j.parco.2019.102582_bib0043 article-title: A radial basis function method for the shallow water equations on a sphere publication-title: Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci – start-page: 318 year: 2016 ident: 10.1016/j.parco.2019.102582_bib0012 article-title: Controlling the memory subscription of distributed applications with a task-based runtime system – volume: 1 start-page: 1 issue: 2 year: 2015 ident: 10.1016/j.parco.2019.102582_bib0028 article-title: Algorithm-based fault tolerance for dense matrix factorizations multiple failures and accuracy, publication-title: ACM Trans. Parallel Comput. doi: 10.1145/2686892 – volume: 298 start-page: 406 year: 2015 ident: 10.1016/j.parco.2019.102582_bib0045 article-title: A scalable RBF-FD method for atmospheric flow publication-title: J. Comput. Phys doi: 10.1016/j.jcp.2015.06.003 – start-page: 6 year: 2015 ident: 10.1016/j.parco.2019.102582_bib0021 article-title: An adaptive load-balancer for task-scheduling in FastFlow – start-page: 1 year: 2015 ident: 10.1016/j.parco.2019.102582_bib0036 article-title: Runtime-driven shared last-level cache management for task-parallel programs – volume: 21 start-page: 173 issue: 2 year: 2011 ident: 10.1016/j.parco.2019.102582_bib0008 article-title: OmpSs: a proposal for programming heterogeneous multi-core architectures publication-title: Parallel Proc. Lett. doi: 10.1142/S0129626411000151 – year: 2017 ident: 10.1016/j.parco.2019.102582_bib0020 – start-page: 169 year: 2018 ident: 10.1016/j.parco.2019.102582_bib0025 article-title: TaskUniVerse: A Task-based Unified Interface for Versatile Parallel Execution doi: 10.1007/978-3-319-78024-5_16 – start-page: 765 year: 2015 ident: 10.1016/j.parco.2019.102582_bib0029 article-title: Design for a soft error resilient dynamic task-based runtime – start-page: 267 year: 2011 ident: 10.1016/j.parco.2019.102582_bib0013 article-title: ClusterSs: A Task-based Programming Model for Clusters – start-page: 315 year: 2013 ident: 10.1016/j.parco.2019.102582_bib0038 article-title: Locality-aware Task Management for Unstructured Parallelism: A Quantitative Limit Study – volume: 30 start-page: 438 issue: 4 year: 2016 ident: 10.1016/j.parco.2019.102582_bib0046 article-title: Fast event-based epidemiological simulations on national scales publication-title: Int. J. High Perform. Comput. Appl. doi: 10.1177/1094342016635723 – start-page: 180 year: 2010 ident: 10.1016/j.parco.2019.102582_bib0041 article-title: hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications – volume: 11 start-page: 1 issue: 3 year: 2014 ident: 10.1016/j.parco.2019.102582_sbref0030 article-title: Topology-aware and dependence-aware scheduling and memory allocation for task-parallel languages publication-title: ACM Trans. Archit. Code Optim doi: 10.1145/2641764 – start-page: 1 year: 2015 ident: 10.1016/j.parco.2019.102582_bib0033 article-title: StatTask: reuse distance analysis for task-based applications – volume: 15 start-page: 36 issue: 6 year: 2013 ident: 10.1016/j.parco.2019.102582_bib0014 article-title: PaRSEC: exploiting heterogeneity to enhance scalability publication-title: Comput. Sci. Eng. doi: 10.1109/MCSE.2013.98 – volume: 37 start-page: C617 issue: 6 year: 2015 ident: 10.1016/j.parco.2019.102582_bib0010 article-title: SuperGlue: a shared memory framework using data versioning for dependency-aware task-based parallelization publication-title: SIAM J. Sci. Comput doi: 10.1137/140989716 – start-page: 253 year: 2013 ident: 10.1016/j.parco.2019.102582_bib0040 article-title: Towards more efficient execution: A decoupled access-execute approach – volume: 36 start-page: 93 issue: 5 year: 2009 ident: 10.1016/j.parco.2019.102582_bib0003 article-title: Wool-a work stealing library publication-title: SIGARCH Comput. Archit. News doi: 10.1145/1556444.1556457 – ident: 10.1016/j.parco.2019.102582_bib0005 – volume: 2 start-page: 67 issue: 4 year: 2016 ident: 10.1016/j.parco.2019.102582_bib0001 article-title: Parallel programming models for dense linear algebra on heterogeneous systems publication-title: Supercomput. Front. Innov – volume: 14 start-page: 1 issue: 1 year: 2015 ident: 10.1016/j.parco.2019.102582_bib0031 article-title: Resource-aware task scheduling publication-title: ACM Trans. Embedded Comput. Syst. doi: 10.1145/2638554 – start-page: 256 year: 2011 ident: 10.1016/j.parco.2019.102582_bib0016 – volume: 5 start-page: 173 issue: 3 year: 1996 ident: 10.1016/j.parco.2019.102582_bib0042 article-title: Design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines publication-title: Sci. Program. – volume: 2014 start-page: 73 year: 2013 ident: 10.1016/j.parco.2019.102582_bib0037 article-title: Analysis of Data Reuse in Task-parallel Runtimes – year: 2014 ident: 10.1016/j.parco.2019.102582_bib0039 article-title: Fix the Code. Don’t Tweak the Hardware: A New Compiler Approach to Voltage-frequency Scaling – volume: 38 start-page: 37 issue: 1–2 year: 2012 ident: 10.1016/j.parco.2019.102582_bib0015 article-title: DAGuE: a generic distributed DAG engine for high performance computing publication-title: Parallel Comput. doi: 10.1016/j.parco.2011.10.003 – start-page: 1 year: 2011 ident: 10.1016/j.parco.2019.102582_bib0004 article-title: A unified scheduler for recursive and task dataflow parallelism – volume: 40 start-page: 328 issue: 7 year: 2014 ident: 10.1016/j.parco.2019.102582_bib0018 article-title: Chunks and tasks: a programming model for parallelization of dynamic algorithms publication-title: Parallel Comput. doi: 10.1016/j.parco.2013.09.006 – start-page: 275 year: 2012 ident: 10.1016/j.parco.2019.102582_bib0019 article-title: Programming models based on data versioning for dependency-aware task-based parallelisation – start-page: 91 year: 1993 ident: 10.1016/j.parco.2019.102582_bib0017 article-title: CHARM++: A portable concurrent object oriented system based on C++ – year: 1999 ident: 10.1016/j.parco.2019.102582_bib0047
SSID	ssj0006480
Score	2.295676
Snippet	•We introduce a hierarchical task parallel programming model for distributed memory systems.•We show that the new model provides both flexibility and...
SourceID	swepub crossref elsevier
SourceType	Open Access Repository Enrichment Source Index Database Publisher
StartPage	102582
SubjectTerms	Data versioning Distributed memory system Hierarchical decomposition High performance computing Scientific computing Task-based parallel programming
Title	DuctTeip: An efficient programming model for distributed task-based parallel computing
URI	https://dx.doi.org/10.1016/j.parco.2019.102582 https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-338832
Volume	90
WOSCitedRecordID	wos000501649400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-7336 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0006480 issn: 1872-7336 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Pb9MwFLZg48AFxi8xfkw-wAkyLXES29wi1gkQmnYoVW9W4jioW8iqNkH783nPdtKwQcUOXKIojR237-vz88vn7xHyxkjNC87yAIPpIE5zHkhRJIERGpqwWBr7on32lZ-eivlcnnlC5tqWE-BNI66u5PK_mhqugbFx6-wtzD10ChfgHIwORzA7HP_J8MedbqdmsfQpP2M1IvCNv6di_cDkgC2AYymGJSrnYtEriDzbfH0R4LxWvkNJ8Lo2teWcd20_wfkw9uzPn2ICOq9yt3k9qzC7NRB-YAHtN3dNatT1dUwyTwHGei6Lbu03D_Vq4D4ZEcoRscPnJ8Hv4hpw7GBdPVDvISGgSVy5oRvO2-URzg_hG2rclxnKw83dv0tlX5vCBmJhz1k7V7YThZ0o18ldshvxRILz3s0-T-Zfhvk6jW19vWHovTaVZQHeGMtf45ex0KwNTqZ75IFfVdDMoeERuWOax-RhX7GDegf-hMx6cHygWUMHaNARNKiFBgVo0BE06AYatIcGHYz_lHw7mUw_fgp8ZY1As_ioDXgVVnCiJeoFJrlkFS8jluaaMcONDrmBVX5sTKyFiFFUMNFRVFVRkpujsogNe0Z2msvGPCe0FFpXYRmJPBVxUrICIuCkEKyEQB-13PZJ1P9eSnvZeax-Uqsttton74dGS6e6sv32tDeE8oGjCwgVQGt7w7fObMNTUG39eDHL1OXqu-o6xZiAOe_F7cbzktzf_DlekZ121ZnX5J7-2S7WqwOPv1_My58f
linkProvider	Elsevier
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DuctTeip%3A+An+efficient+programming+model+for+distributed+task-based+parallel+computing&rft.jtitle=Parallel+computing&rft.au=Zafari%2C+Afshin&rft.au=Larsson%2C+Elisabeth&rft.au=Tillenius%2C+Martin&rft.date=2019-12-01&rft.issn=0167-8191&rft.volume=90&rft.spage=102582&rft_id=info:doi/10.1016%2Fj.parco.2019.102582&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_parco_2019_102582
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-8191&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-8191&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-8191&client=summon