Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model

The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single m...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on parallel and distributed systems s. 1
Hlavní autoři:	Agullo, Emmanuel, Aumage, Olivier, Faverge, Mathieu, Furmento, Nathalie, Pruvost, Florent, Sergent, Marc, Thibault, Samuel Paul
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	IEEE 18.12.2017 Institute of Electrical and Electronics Engineers
Témata:	Algorithm design and analysis Cholesky factorization Computer Science distributed computing Distributed, Parallel, and Cluster Computing GPU heterogeneous computing Libraries multicore Productivity Programming Runtime runtime system sequential task flow Supercomputers task-based programming task-based programming heterogeneous computing GPU multicore Cholesky factorization distributed computing runtime system sequential task flow
ISSN:	1045-9219, 1558-2183
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single multicore node possibly enhanced with accelerators, which motivated its support in the OpenMP 4.0 standard. In this paper, we show that this paradigm can also be employed to achieve high performance on modern supercomputers composed of multiple such nodes, with extremely limited changes in the user code. To prove this claim, we have extended the StarPU runtime system with an advanced inter-node data management layer that supports this model by posting communications automatically. We illustrate our discussion with the task-based tile Cholesky algorithm that we implemented on top of this new runtime system layer. We show that it allows for very high productivity while achieving a performance competitive with both the pure Message Passing Interface (MPI)-based ScaLAPACK Cholesky reference implementation and the DPLASMA Cholesky code, which implements another (non sequential) task-based programming paradigm.
AbstractList	The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single multicore node possibly enhanced with accelerators, which motivated its support in the OpenMP 4.0 standard. In this paper, we show that this paradigm can also be employed to achieve high performance on modern supercomputers composed of multiple such nodes, with extremely limited changes in the user code. To prove this claim, we have extended the StarPU runtime system with an advanced inter-node data management layer that supports this model by posting communications automatically. We illustrate our discussion with the task-based tile Cholesky algorithm that we implemented on top of this new runtime system layer. We show that it enables very high productivity while achieving a performance competitive with both the pure Message Passing Interface (MPI)-based ScaLAPACK Cholesky reference implementation and the DPLASMA Cholesky code, which implements another (non-sequential) task-based programming paradigm. The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single multicore node possibly enhanced with accelerators, which motivated its support in the OpenMP 4.0 standard. In this paper, we show that this paradigm can also be employed to achieve high performance on modern supercomputers composed of multiple such nodes, with extremely limited changes in the user code. To prove this claim, we have extended the StarPU runtime system with an advanced inter-node data management layer that supports this model by posting communications automatically. We illustrate our discussion with the task-based tile Cholesky algorithm that we implemented on top of this new runtime system layer. We show that it allows for very high productivity while achieving a performance competitive with both the pure Message Passing Interface (MPI)-based ScaLAPACK Cholesky reference implementation and the DPLASMA Cholesky code, which implements another (non sequential) task-based programming paradigm.
Author	Pruvost, Florent Thibault, Samuel Paul Faverge, Mathieu Aumage, Olivier Furmento, Nathalie Sergent, Marc Agullo, Emmanuel
Author_xml	– sequence: 1 givenname: Emmanuel surname: Agullo fullname: Agullo, Emmanuel organization: HiePACS, Inria Centre de recherche Bordeaux Sud-Ouest, 113923 Talence, Aquitaine France (e-mail: emmanuel.agullo@inria.fr) – sequence: 2 givenname: Olivier surname: Aumage fullname: Aumage, Olivier organization: STORM, Inria Centre de recherche Bordeaux Sud-Ouest, 113923 Talence, Aquitaine France (e-mail: olivier.aumage@inria.fr) – sequence: 3 givenname: Mathieu surname: Faverge fullname: Faverge, Mathieu organization: HiePACS, Bordeaux INP, Talence, Aquitaine France (e-mail: mathieu.faverge@inria.fr) – sequence: 4 givenname: Nathalie surname: Furmento fullname: Furmento, Nathalie organization: STORM, LaBRI, TALENCE, Aquitaine France (e-mail: nathalie.furmento@labri.fr) – sequence: 5 givenname: Florent surname: Pruvost fullname: Pruvost, Florent organization: HiePACS, Inria Centre de recherche Bordeaux Sud-Ouest, 113923 Talence, Aquitaine France (e-mail: florent.pruvost@inria.fr) – sequence: 6 givenname: Marc surname: Sergent fullname: Sergent, Marc organization: STORM, Inria Centre de recherche Bordeaux Sud-Ouest, 113923 Talence, Aquitaine France (e-mail: marc.sergent@inria.fr) – sequence: 7 givenname: Samuel Paul surname: Thibault fullname: Thibault, Samuel Paul organization: Computer science, LaBRI, TALENCE, - France 33405 (e-mail: samuel.thibault@u-bordeaux.fr)
BackLink	https://inria.hal.science/hal-01618526$$DView record in HAL
BookMark	eNp9kD1PwzAQhi1UJCjwAxCLV4YUn504yVjxVaQiKrVMDJbjXBpDEhcnLeLfk6jAwMB0r07Pcye9YzJqXIOEnAObALD0arW4WU44g3jCYymZDA_IMURREnBIxKjPLIyClEN6RMZt-8oYhBELj8nL1JQWd7ZZ05ldl3SBvnC-1o1B6hq63G7QG1dvth36ln7YrqSaLvF9i01ndUVXun0LMt1iThferb2u6-HWo8uxOiWHha5aPPueJ-T57nZ1PQvmT_cP19N5YHgUhwEaxmIGRcFEkUGUZwZAaCOjhGMIGsJYZiJPGeo0F0IwnUmDRvacwD6k4oRc7u-WulIbb2vtP5XTVs2mczXsGEhIIi530LPxnjXeta3HQhnb6c66pvPaVgqYGvpUQ59q6FN999mb8Mf8efWfc7F3LCL-8gnnMk5S8QWlJ4PX
CODEN	ITDSEO
CitedBy_id	crossref_primary_10_1145_3743134 crossref_primary_10_1002_cpe_4472 crossref_primary_10_1007_s00607_023_01190_w crossref_primary_10_1007_s11227_022_04355_0 crossref_primary_10_1109_TPDS_2021_3084071 crossref_primary_10_1002_cpe_7920 crossref_primary_10_1109_TPDS_2020_2992923 crossref_primary_10_1002_cpe_4490 crossref_primary_10_1002_hyp_13722 crossref_primary_10_1007_s10766_018_0619_1 crossref_primary_10_1177_10943420241286531 crossref_primary_10_1109_TPDS_2021_3131657 crossref_primary_10_15803_ijnc_13_1_62 crossref_primary_10_1145_3583560
ContentType	Journal Article
Copyright	Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml	– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID	97E RIA RIE AAYXX CITATION 1XC VOOES
DOI	10.1109/TPDS.2017.2766064
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	1558-2183
EndPage	1
ExternalDocumentID	oai:HAL:hal-01618526v1 10_1109_TPDS_2017_2766064 8226789
Genre	orig-research
GrantInformation_xml	– fundername: ANR SOLHAR grantid: ANR-13-MONU-0007
GroupedDBID	--Z -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS TN5 TWZ UHB 5VS AAYXX ABFSI AETIX AGSQL AI. AIBXA ALLEH CITATION E.L H~9 ICLAB IFJZH RNI RZB VH1 1XC VOOES
ID	FETCH-LOGICAL-c2574-ec00701ff03fb15dbc113ac6582e41a1476b3d90ea9d3330ab6cec6dbc3ecec93
IEDL.DBID	RIE
ISSN	1045-9219
IngestDate	Sat Nov 29 15:00:52 EST 2025 Sat Nov 29 06:06:46 EST 2025 Tue Nov 18 22:00:30 EST 2025 Wed Aug 27 02:13:04 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	task-based programming heterogeneous computing GPU multicore Cholesky factorization distributed computing runtime system sequential task flow
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c2574-ec00701ff03fb15dbc113ac6582e41a1476b3d90ea9d3330ab6cec6dbc3ecec93
ORCID	0000-0002-5406-8743 0000-0001-6411-809X 0000-0002-2128-1230 0000-0003-0655-6934 0000-0003-2824-2370
OpenAccessLink	https://inria.hal.science/hal-01618526
PageCount	1
ParticipantIDs	hal_primary_oai_HAL_hal_01618526v1 ieee_primary_8226789 crossref_citationtrail_10_1109_TPDS_2017_2766064 crossref_primary_10_1109_TPDS_2017_2766064
PublicationCentury	2000
PublicationDate	2017-12-18
PublicationDateYYYYMMDD	2017-12-18
PublicationDate_xml	– month: 12 year: 2017 text: 2017-12-18 day: 18
PublicationDecade	2010
PublicationTitle	IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev	TPDS
PublicationYear	2017
Publisher	IEEE Institute of Electrical and Electronics Engineers
Publisher_xml	– name: IEEE – name: Institute of Electrical and Electronics Engineers
SSID	ssj0014504
Score	2.4505525
Snippet	The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for...
SourceID	hal crossref ieee
SourceType	Open Access Repository Enrichment Source Index Database Publisher
StartPage	1
SubjectTerms	Algorithm design and analysis Cholesky factorization Computer Science distributed computing Distributed, Parallel, and Cluster Computing GPU heterogeneous computing Libraries multicore Productivity Programming Runtime runtime system sequential task flow Supercomputers task-based programming
Title	Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model
URI	https://ieeexplore.ieee.org/document/8226789 https://inria.hal.science/hal-01618526
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2183 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014504 issn: 1045-9219 databaseCode: RIE dateStart: 19900101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED7c8EEfnD9x_iKIT2K3pmmT5XGoYw8ig00QfChtemXDucnm9veby7qqIIJvJVzT0q_J3SVf7gO4ClWoc7-Fnk7z3CYokfQoLPck5soPEptQ6MyJTajHx9bzs-5twE15FgYRHfkMG3Tp9vKzqVnQUlnTOjM7t-oKVJRSq7Na5Y5BGDmpQJtdRJ62w7DYweS-bg56d30icalGoKQN2MMfPqgyJAbkN2kV51k6tf-90y7sFBEka68g34MNnOxDba3OwIrBug_b30oNHsBL2wxHSKsHjKgdrPd1YIBNJ6y_eMeZKfqYM1qdZQnrO561nQPGbJDMXz1yeRnrrShdb9QXSamND-Gpcz-47XqFsIJn7AgNPTRU5YfnuS_ylEdZajgXibHBSIAhT3ioZCoy7WOiMyGEn6TSoJHWTqC90OIIqpPpBI-BCZUil5H1tIEOkRRsbIaWRlJkkbZzOtbBX3_q2BRVx0n8Yhy77MPXMaETEzpxgU4drstb3lclN_4yvrT4lXZULLvbfoipzWkBRIFc8jocEHClVYHZye_Np7BFTyDOCm-dQfVjtsBz2DTLj9F8duH-uk8OKdRg
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS-NAEB60CuqDvw97nrqIT2I0m_2R7mO5UyrWUmgFwYeQbCYoV9vSWv_-29nGnIIIvoVlsgn5sjszu9_OB3AiY2mKsIGByYrCJShKBxSWBxqLOIxSl1CY3ItNxJ1O4_7edBfgrDoLg4iefIbndOn38vORndFS2YVzZm5uNYuwpKSM-Py0VrVnIJUXC3T5hQqMG4jlHiYPzUW_-6dHNK74PIq1C9nlBy-0-EgcyHfiKt63XG187602Yb2MIVlzDvoWLOBwGzbe9BlYOVy3Ye1dscEdeGjaxyek9QNG5A7W_X9kgI2GrDcb48SWfUwZrc-ylPU809rNAgPWT6d_A3J6OevOSV3P1BeJqQ124e7qsv-7FZTSCoF1Y1QGaKnODy-KUBQZV3lmORepdeFIhJKnXMY6E7kJMTW5ECJMM23Ramcn0F0Y8QNqw9EQ94CJOEOulfO1kZFIGjYuR8uUFrkyblbHOoRvnzqxZd1xkr8YJD7_CE1C6CSETlKiU4fT6pbxvOjGV8bHDr_Kjsplt5rthNq8GoCK9Cuvww4BV1mVmP38vPkIVlr923bSvu7c7MMqPY0YLLzxC2ovkxkewLJ9fXmaTg79H_gPQSrXpw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Achieving+High+Performance+on+Supercomputers+with+a+Sequential+Task-based+Programming+Model&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Agullo%2C+Emmanuel&rft.au=Aumage%2C+Olivier&rft.au=Faverge%2C+Mathieu&rft.au=Furmento%2C+Nathalie&rft.date=2017-12-18&rft.pub=IEEE&rft.issn=1045-9219&rft.spage=1&rft.epage=1&rft_id=info:doi/10.1109%2FTPDS.2017.2766064&rft.externalDocID=8226789
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon