Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model
The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single m...
Uloženo v:
| Vydáno v: | IEEE transactions on parallel and distributed systems s. 1 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
18.12.2017
Institute of Electrical and Electronics Engineers |
| Témata: | |
| ISSN: | 1045-9219, 1558-2183 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single multicore node possibly enhanced with accelerators, which motivated its support in the OpenMP 4.0 standard. In this paper, we show that this paradigm can also be employed to achieve high performance on modern supercomputers composed of multiple such nodes, with extremely limited changes in the user code. To prove this claim, we have extended the StarPU runtime system with an advanced inter-node data management layer that supports this model by posting communications automatically. We illustrate our discussion with the task-based tile Cholesky algorithm that we implemented on top of this new runtime system layer. We show that it allows for very high productivity while achieving a performance competitive with both the pure Message Passing Interface (MPI)-based ScaLAPACK Cholesky reference implementation and the DPLASMA Cholesky code, which implements another (non sequential) task-based programming paradigm. |
|---|---|
| AbstractList | The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single multicore node possibly enhanced with accelerators, which motivated its support in the OpenMP 4.0 standard. In this paper, we show that this paradigm can also be employed to achieve high performance on modern supercomputers composed of multiple such nodes, with extremely limited changes in the user code. To prove this claim, we have extended the StarPU runtime system with an advanced inter-node data management layer that supports this model by posting communications automatically. We illustrate our discussion with the task-based tile Cholesky algorithm that we implemented on top of this new runtime system layer. We show that it enables very high productivity while achieving a performance competitive with both the pure Message Passing Interface (MPI)-based ScaLAPACK Cholesky reference implementation and the DPLASMA Cholesky code, which implements another (non-sequential) task-based programming paradigm. The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single multicore node possibly enhanced with accelerators, which motivated its support in the OpenMP 4.0 standard. In this paper, we show that this paradigm can also be employed to achieve high performance on modern supercomputers composed of multiple such nodes, with extremely limited changes in the user code. To prove this claim, we have extended the StarPU runtime system with an advanced inter-node data management layer that supports this model by posting communications automatically. We illustrate our discussion with the task-based tile Cholesky algorithm that we implemented on top of this new runtime system layer. We show that it allows for very high productivity while achieving a performance competitive with both the pure Message Passing Interface (MPI)-based ScaLAPACK Cholesky reference implementation and the DPLASMA Cholesky code, which implements another (non sequential) task-based programming paradigm. |
| Author | Pruvost, Florent Thibault, Samuel Paul Faverge, Mathieu Aumage, Olivier Furmento, Nathalie Sergent, Marc Agullo, Emmanuel |
| Author_xml | – sequence: 1 givenname: Emmanuel surname: Agullo fullname: Agullo, Emmanuel organization: HiePACS, Inria Centre de recherche Bordeaux Sud-Ouest, 113923 Talence, Aquitaine France (e-mail: emmanuel.agullo@inria.fr) – sequence: 2 givenname: Olivier surname: Aumage fullname: Aumage, Olivier organization: STORM, Inria Centre de recherche Bordeaux Sud-Ouest, 113923 Talence, Aquitaine France (e-mail: olivier.aumage@inria.fr) – sequence: 3 givenname: Mathieu surname: Faverge fullname: Faverge, Mathieu organization: HiePACS, Bordeaux INP, Talence, Aquitaine France (e-mail: mathieu.faverge@inria.fr) – sequence: 4 givenname: Nathalie surname: Furmento fullname: Furmento, Nathalie organization: STORM, LaBRI, TALENCE, Aquitaine France (e-mail: nathalie.furmento@labri.fr) – sequence: 5 givenname: Florent surname: Pruvost fullname: Pruvost, Florent organization: HiePACS, Inria Centre de recherche Bordeaux Sud-Ouest, 113923 Talence, Aquitaine France (e-mail: florent.pruvost@inria.fr) – sequence: 6 givenname: Marc surname: Sergent fullname: Sergent, Marc organization: STORM, Inria Centre de recherche Bordeaux Sud-Ouest, 113923 Talence, Aquitaine France (e-mail: marc.sergent@inria.fr) – sequence: 7 givenname: Samuel Paul surname: Thibault fullname: Thibault, Samuel Paul organization: Computer science, LaBRI, TALENCE, - France 33405 (e-mail: samuel.thibault@u-bordeaux.fr) |
| BackLink | https://inria.hal.science/hal-01618526$$DView record in HAL |
| BookMark | eNp9kD1PwzAQhi1UJCjwAxCLV4YUn504yVjxVaQiKrVMDJbjXBpDEhcnLeLfk6jAwMB0r07Pcye9YzJqXIOEnAObALD0arW4WU44g3jCYymZDA_IMURREnBIxKjPLIyClEN6RMZt-8oYhBELj8nL1JQWd7ZZ05ldl3SBvnC-1o1B6hq63G7QG1dvth36ln7YrqSaLvF9i01ndUVXun0LMt1iThferb2u6-HWo8uxOiWHha5aPPueJ-T57nZ1PQvmT_cP19N5YHgUhwEaxmIGRcFEkUGUZwZAaCOjhGMIGsJYZiJPGeo0F0IwnUmDRvacwD6k4oRc7u-WulIbb2vtP5XTVs2mczXsGEhIIi530LPxnjXeta3HQhnb6c66pvPaVgqYGvpUQ59q6FN999mb8Mf8efWfc7F3LCL-8gnnMk5S8QWlJ4PX |
| CODEN | ITDSEO |
| CitedBy_id | crossref_primary_10_1145_3743134 crossref_primary_10_1002_cpe_4472 crossref_primary_10_1007_s00607_023_01190_w crossref_primary_10_1007_s11227_022_04355_0 crossref_primary_10_1109_TPDS_2021_3084071 crossref_primary_10_1002_cpe_7920 crossref_primary_10_1109_TPDS_2020_2992923 crossref_primary_10_1002_cpe_4490 crossref_primary_10_1002_hyp_13722 crossref_primary_10_1007_s10766_018_0619_1 crossref_primary_10_1177_10943420241286531 crossref_primary_10_1109_TPDS_2021_3131657 crossref_primary_10_15803_ijnc_13_1_62 crossref_primary_10_1145_3583560 |
| ContentType | Journal Article |
| Copyright | Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| DBID | 97E RIA RIE AAYXX CITATION 1XC VOOES |
| DOI | 10.1109/TPDS.2017.2766064 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1558-2183 |
| EndPage | 1 |
| ExternalDocumentID | oai:HAL:hal-01618526v1 10_1109_TPDS_2017_2766064 8226789 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: ANR SOLHAR grantid: ANR-13-MONU-0007 |
| GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS TN5 TWZ UHB 5VS AAYXX ABFSI AETIX AGSQL AI. AIBXA ALLEH CITATION E.L H~9 ICLAB IFJZH RNI RZB VH1 1XC VOOES |
| ID | FETCH-LOGICAL-c2574-ec00701ff03fb15dbc113ac6582e41a1476b3d90ea9d3330ab6cec6dbc3ecec93 |
| IEDL.DBID | RIE |
| ISSN | 1045-9219 |
| IngestDate | Sat Nov 29 15:00:52 EST 2025 Sat Nov 29 06:06:46 EST 2025 Tue Nov 18 22:00:30 EST 2025 Wed Aug 27 02:13:04 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | task-based programming heterogeneous computing GPU multicore Cholesky factorization distributed computing runtime system sequential task flow |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c2574-ec00701ff03fb15dbc113ac6582e41a1476b3d90ea9d3330ab6cec6dbc3ecec93 |
| ORCID | 0000-0002-5406-8743 0000-0001-6411-809X 0000-0002-2128-1230 0000-0003-0655-6934 0000-0003-2824-2370 |
| OpenAccessLink | https://inria.hal.science/hal-01618526 |
| PageCount | 1 |
| ParticipantIDs | hal_primary_oai_HAL_hal_01618526v1 ieee_primary_8226789 crossref_citationtrail_10_1109_TPDS_2017_2766064 crossref_primary_10_1109_TPDS_2017_2766064 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-12-18 |
| PublicationDateYYYYMMDD | 2017-12-18 |
| PublicationDate_xml | – month: 12 year: 2017 text: 2017-12-18 day: 18 |
| PublicationDecade | 2010 |
| PublicationTitle | IEEE transactions on parallel and distributed systems |
| PublicationTitleAbbrev | TPDS |
| PublicationYear | 2017 |
| Publisher | IEEE Institute of Electrical and Electronics Engineers |
| Publisher_xml | – name: IEEE – name: Institute of Electrical and Electronics Engineers |
| SSID | ssj0014504 |
| Score | 2.4505525 |
| Snippet | The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for... |
| SourceID | hal crossref ieee |
| SourceType | Open Access Repository Enrichment Source Index Database Publisher |
| StartPage | 1 |
| SubjectTerms | Algorithm design and analysis Cholesky factorization Computer Science distributed computing Distributed, Parallel, and Cluster Computing GPU heterogeneous computing Libraries multicore Productivity Programming Runtime runtime system sequential task flow Supercomputers task-based programming |
| Title | Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model |
| URI | https://ieeexplore.ieee.org/document/8226789 https://inria.hal.science/hal-01618526 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2183 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014504 issn: 1045-9219 databaseCode: RIE dateStart: 19900101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED7c8EEfnD9x_iKIT2K3pmmT5XGoYw8ig00QfChtemXDucnm9veby7qqIIJvJVzT0q_J3SVf7gO4ClWoc7-Fnk7z3CYokfQoLPck5soPEptQ6MyJTajHx9bzs-5twE15FgYRHfkMG3Tp9vKzqVnQUlnTOjM7t-oKVJRSq7Na5Y5BGDmpQJtdRJ62w7DYweS-bg56d30icalGoKQN2MMfPqgyJAbkN2kV51k6tf-90y7sFBEka68g34MNnOxDba3OwIrBug_b30oNHsBL2wxHSKsHjKgdrPd1YIBNJ6y_eMeZKfqYM1qdZQnrO561nQPGbJDMXz1yeRnrrShdb9QXSamND-Gpcz-47XqFsIJn7AgNPTRU5YfnuS_ylEdZajgXibHBSIAhT3ioZCoy7WOiMyGEn6TSoJHWTqC90OIIqpPpBI-BCZUil5H1tIEOkRRsbIaWRlJkkbZzOtbBX3_q2BRVx0n8Yhy77MPXMaETEzpxgU4drstb3lclN_4yvrT4lXZULLvbfoipzWkBRIFc8jocEHClVYHZye_Np7BFTyDOCm-dQfVjtsBz2DTLj9F8duH-uk8OKdRg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS-NAEB60CuqDvw97nrqIT2I0m_2R7mO5UyrWUmgFwYeQbCYoV9vSWv_-29nGnIIIvoVlsgn5sjszu9_OB3AiY2mKsIGByYrCJShKBxSWBxqLOIxSl1CY3ItNxJ1O4_7edBfgrDoLg4iefIbndOn38vORndFS2YVzZm5uNYuwpKSM-Py0VrVnIJUXC3T5hQqMG4jlHiYPzUW_-6dHNK74PIq1C9nlBy-0-EgcyHfiKt63XG187602Yb2MIVlzDvoWLOBwGzbe9BlYOVy3Ye1dscEdeGjaxyek9QNG5A7W_X9kgI2GrDcb48SWfUwZrc-ylPU809rNAgPWT6d_A3J6OevOSV3P1BeJqQ124e7qsv-7FZTSCoF1Y1QGaKnODy-KUBQZV3lmORepdeFIhJKnXMY6E7kJMTW5ECJMM23Ramcn0F0Y8QNqw9EQ94CJOEOulfO1kZFIGjYuR8uUFrkyblbHOoRvnzqxZd1xkr8YJD7_CE1C6CSETlKiU4fT6pbxvOjGV8bHDr_Kjsplt5rthNq8GoCK9Cuvww4BV1mVmP38vPkIVlr923bSvu7c7MMqPY0YLLzxC2ovkxkewLJ9fXmaTg79H_gPQSrXpw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Achieving+High+Performance+on+Supercomputers+with+a+Sequential+Task-based+Programming+Model&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Agullo%2C+Emmanuel&rft.au=Aumage%2C+Olivier&rft.au=Faverge%2C+Mathieu&rft.au=Furmento%2C+Nathalie&rft.date=2017-12-18&rft.pub=IEEE&rft.issn=1045-9219&rft.spage=1&rft.epage=1&rft_id=info:doi/10.1109%2FTPDS.2017.2766064&rft.externalDocID=8226789 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |