Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model

The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single m...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on parallel and distributed systems s. 1
Hlavní autoři: Agullo, Emmanuel, Aumage, Olivier, Faverge, Mathieu, Furmento, Nathalie, Pruvost, Florent, Sergent, Marc, Thibault, Samuel Paul
Médium: Journal Article
Jazyk:angličtina
Vydáno: IEEE 18.12.2017
Institute of Electrical and Electronics Engineers
Témata:
ISSN:1045-9219, 1558-2183
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single multicore node possibly enhanced with accelerators, which motivated its support in the OpenMP 4.0 standard. In this paper, we show that this paradigm can also be employed to achieve high performance on modern supercomputers composed of multiple such nodes, with extremely limited changes in the user code. To prove this claim, we have extended the StarPU runtime system with an advanced inter-node data management layer that supports this model by posting communications automatically. We illustrate our discussion with the task-based tile Cholesky algorithm that we implemented on top of this new runtime system layer. We show that it allows for very high productivity while achieving a performance competitive with both the pure Message Passing Interface (MPI)-based ScaLAPACK Cholesky reference implementation and the DPLASMA Cholesky code, which implements another (non sequential) task-based programming paradigm.
AbstractList The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single multicore node possibly enhanced with accelerators, which motivated its support in the OpenMP 4.0 standard. In this paper, we show that this paradigm can also be employed to achieve high performance on modern supercomputers composed of multiple such nodes, with extremely limited changes in the user code. To prove this claim, we have extended the StarPU runtime system with an advanced inter-node data management layer that supports this model by posting communications automatically. We illustrate our discussion with the task-based tile Cholesky algorithm that we implemented on top of this new runtime system layer. We show that it enables very high productivity while achieving a performance competitive with both the pure Message Passing Interface (MPI)-based ScaLAPACK Cholesky reference implementation and the DPLASMA Cholesky code, which implements another (non-sequential) task-based programming paradigm.
The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single multicore node possibly enhanced with accelerators, which motivated its support in the OpenMP 4.0 standard. In this paper, we show that this paradigm can also be employed to achieve high performance on modern supercomputers composed of multiple such nodes, with extremely limited changes in the user code. To prove this claim, we have extended the StarPU runtime system with an advanced inter-node data management layer that supports this model by posting communications automatically. We illustrate our discussion with the task-based tile Cholesky algorithm that we implemented on top of this new runtime system layer. We show that it allows for very high productivity while achieving a performance competitive with both the pure Message Passing Interface (MPI)-based ScaLAPACK Cholesky reference implementation and the DPLASMA Cholesky code, which implements another (non sequential) task-based programming paradigm.
Author Pruvost, Florent
Thibault, Samuel Paul
Faverge, Mathieu
Aumage, Olivier
Furmento, Nathalie
Sergent, Marc
Agullo, Emmanuel
Author_xml – sequence: 1
  givenname: Emmanuel
  surname: Agullo
  fullname: Agullo, Emmanuel
  organization: HiePACS, Inria Centre de recherche Bordeaux Sud-Ouest, 113923 Talence, Aquitaine France (e-mail: emmanuel.agullo@inria.fr)
– sequence: 2
  givenname: Olivier
  surname: Aumage
  fullname: Aumage, Olivier
  organization: STORM, Inria Centre de recherche Bordeaux Sud-Ouest, 113923 Talence, Aquitaine France (e-mail: olivier.aumage@inria.fr)
– sequence: 3
  givenname: Mathieu
  surname: Faverge
  fullname: Faverge, Mathieu
  organization: HiePACS, Bordeaux INP, Talence, Aquitaine France (e-mail: mathieu.faverge@inria.fr)
– sequence: 4
  givenname: Nathalie
  surname: Furmento
  fullname: Furmento, Nathalie
  organization: STORM, LaBRI, TALENCE, Aquitaine France (e-mail: nathalie.furmento@labri.fr)
– sequence: 5
  givenname: Florent
  surname: Pruvost
  fullname: Pruvost, Florent
  organization: HiePACS, Inria Centre de recherche Bordeaux Sud-Ouest, 113923 Talence, Aquitaine France (e-mail: florent.pruvost@inria.fr)
– sequence: 6
  givenname: Marc
  surname: Sergent
  fullname: Sergent, Marc
  organization: STORM, Inria Centre de recherche Bordeaux Sud-Ouest, 113923 Talence, Aquitaine France (e-mail: marc.sergent@inria.fr)
– sequence: 7
  givenname: Samuel Paul
  surname: Thibault
  fullname: Thibault, Samuel Paul
  organization: Computer science, LaBRI, TALENCE, - France 33405 (e-mail: samuel.thibault@u-bordeaux.fr)
BackLink https://inria.hal.science/hal-01618526$$DView record in HAL
BookMark eNp9kD1PwzAQhi1UJCjwAxCLV4YUn504yVjxVaQiKrVMDJbjXBpDEhcnLeLfk6jAwMB0r07Pcye9YzJqXIOEnAObALD0arW4WU44g3jCYymZDA_IMURREnBIxKjPLIyClEN6RMZt-8oYhBELj8nL1JQWd7ZZ05ldl3SBvnC-1o1B6hq63G7QG1dvth36ln7YrqSaLvF9i01ndUVXun0LMt1iThferb2u6-HWo8uxOiWHha5aPPueJ-T57nZ1PQvmT_cP19N5YHgUhwEaxmIGRcFEkUGUZwZAaCOjhGMIGsJYZiJPGeo0F0IwnUmDRvacwD6k4oRc7u-WulIbb2vtP5XTVs2mczXsGEhIIi530LPxnjXeta3HQhnb6c66pvPaVgqYGvpUQ59q6FN999mb8Mf8efWfc7F3LCL-8gnnMk5S8QWlJ4PX
CODEN ITDSEO
CitedBy_id crossref_primary_10_1145_3743134
crossref_primary_10_1002_cpe_4472
crossref_primary_10_1007_s00607_023_01190_w
crossref_primary_10_1007_s11227_022_04355_0
crossref_primary_10_1109_TPDS_2021_3084071
crossref_primary_10_1002_cpe_7920
crossref_primary_10_1109_TPDS_2020_2992923
crossref_primary_10_1002_cpe_4490
crossref_primary_10_1002_hyp_13722
crossref_primary_10_1007_s10766_018_0619_1
crossref_primary_10_1177_10943420241286531
crossref_primary_10_1109_TPDS_2021_3131657
crossref_primary_10_15803_ijnc_13_1_62
crossref_primary_10_1145_3583560
ContentType Journal Article
Copyright Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID 97E
RIA
RIE
AAYXX
CITATION
1XC
VOOES
DOI 10.1109/TPDS.2017.2766064
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library (IEL)
CrossRef
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle CrossRef
DatabaseTitleList

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2183
EndPage 1
ExternalDocumentID oai:HAL:hal-01618526v1
10_1109_TPDS_2017_2766064
8226789
Genre orig-research
GrantInformation_xml – fundername: ANR SOLHAR
  grantid: ANR-13-MONU-0007
GroupedDBID --Z
-~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
TN5
TWZ
UHB
5VS
AAYXX
ABFSI
AETIX
AGSQL
AI.
AIBXA
ALLEH
CITATION
E.L
H~9
ICLAB
IFJZH
RNI
RZB
VH1
1XC
VOOES
ID FETCH-LOGICAL-c2574-ec00701ff03fb15dbc113ac6582e41a1476b3d90ea9d3330ab6cec6dbc3ecec93
IEDL.DBID RIE
ISSN 1045-9219
IngestDate Sat Nov 29 15:00:52 EST 2025
Sat Nov 29 06:06:46 EST 2025
Tue Nov 18 22:00:30 EST 2025
Wed Aug 27 02:13:04 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords task-based programming
heterogeneous computing
GPU
multicore
Cholesky factorization
distributed computing
runtime system
sequential task flow
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2574-ec00701ff03fb15dbc113ac6582e41a1476b3d90ea9d3330ab6cec6dbc3ecec93
ORCID 0000-0002-5406-8743
0000-0001-6411-809X
0000-0002-2128-1230
0000-0003-0655-6934
0000-0003-2824-2370
OpenAccessLink https://inria.hal.science/hal-01618526
PageCount 1
ParticipantIDs hal_primary_oai_HAL_hal_01618526v1
ieee_primary_8226789
crossref_citationtrail_10_1109_TPDS_2017_2766064
crossref_primary_10_1109_TPDS_2017_2766064
PublicationCentury 2000
PublicationDate 2017-12-18
PublicationDateYYYYMMDD 2017-12-18
PublicationDate_xml – month: 12
  year: 2017
  text: 2017-12-18
  day: 18
PublicationDecade 2010
PublicationTitle IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev TPDS
PublicationYear 2017
Publisher IEEE
Institute of Electrical and Electronics Engineers
Publisher_xml – name: IEEE
– name: Institute of Electrical and Electronics Engineers
SSID ssj0014504
Score 2.4505525
Snippet The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for...
SourceID hal
crossref
ieee
SourceType Open Access Repository
Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms Algorithm design and analysis
Cholesky factorization
Computer Science
distributed computing
Distributed, Parallel, and Cluster Computing
GPU
heterogeneous computing
Libraries
multicore
Productivity
Programming
Runtime
runtime system
sequential task flow
Supercomputers
task-based programming
Title Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model
URI https://ieeexplore.ieee.org/document/8226789
https://inria.hal.science/hal-01618526
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2183
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014504
  issn: 1045-9219
  databaseCode: RIE
  dateStart: 19900101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED7c8EEfnD9x_iKIT2K3pmmT5XGoYw8ig00QfChtemXDucnm9veby7qqIIJvJVzT0q_J3SVf7gO4ClWoc7-Fnk7z3CYokfQoLPck5soPEptQ6MyJTajHx9bzs-5twE15FgYRHfkMG3Tp9vKzqVnQUlnTOjM7t-oKVJRSq7Na5Y5BGDmpQJtdRJ62w7DYweS-bg56d30icalGoKQN2MMfPqgyJAbkN2kV51k6tf-90y7sFBEka68g34MNnOxDba3OwIrBug_b30oNHsBL2wxHSKsHjKgdrPd1YIBNJ6y_eMeZKfqYM1qdZQnrO561nQPGbJDMXz1yeRnrrShdb9QXSamND-Gpcz-47XqFsIJn7AgNPTRU5YfnuS_ylEdZajgXibHBSIAhT3ioZCoy7WOiMyGEn6TSoJHWTqC90OIIqpPpBI-BCZUil5H1tIEOkRRsbIaWRlJkkbZzOtbBX3_q2BRVx0n8Yhy77MPXMaETEzpxgU4drstb3lclN_4yvrT4lXZULLvbfoipzWkBRIFc8jocEHClVYHZye_Np7BFTyDOCm-dQfVjtsBz2DTLj9F8duH-uk8OKdRg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS-NAEB60CuqDvw97nrqIT2I0m_2R7mO5UyrWUmgFwYeQbCYoV9vSWv_-29nGnIIIvoVlsgn5sjszu9_OB3AiY2mKsIGByYrCJShKBxSWBxqLOIxSl1CY3ItNxJ1O4_7edBfgrDoLg4iefIbndOn38vORndFS2YVzZm5uNYuwpKSM-Py0VrVnIJUXC3T5hQqMG4jlHiYPzUW_-6dHNK74PIq1C9nlBy-0-EgcyHfiKt63XG187602Yb2MIVlzDvoWLOBwGzbe9BlYOVy3Ye1dscEdeGjaxyek9QNG5A7W_X9kgI2GrDcb48SWfUwZrc-ylPU809rNAgPWT6d_A3J6OevOSV3P1BeJqQ124e7qsv-7FZTSCoF1Y1QGaKnODy-KUBQZV3lmORepdeFIhJKnXMY6E7kJMTW5ECJMM23Ramcn0F0Y8QNqw9EQ94CJOEOulfO1kZFIGjYuR8uUFrkyblbHOoRvnzqxZd1xkr8YJD7_CE1C6CSETlKiU4fT6pbxvOjGV8bHDr_Kjsplt5rthNq8GoCK9Cuvww4BV1mVmP38vPkIVlr923bSvu7c7MMqPY0YLLzxC2ovkxkewLJ9fXmaTg79H_gPQSrXpw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Achieving+High+Performance+on+Supercomputers+with+a+Sequential+Task-based+Programming+Model&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Agullo%2C+Emmanuel&rft.au=Aumage%2C+Olivier&rft.au=Faverge%2C+Mathieu&rft.au=Furmento%2C+Nathalie&rft.date=2017-12-18&rft.pub=IEEE&rft.issn=1045-9219&rft.spage=1&rft.epage=1&rft_id=info:doi/10.1109%2FTPDS.2017.2766064&rft.externalDocID=8226789
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon