OmpSs-2@Cluster: Distributed memory execution of nested OpenMP-style tasks

Uloženo v:
Podrobná bibliografie
Název: OmpSs-2@Cluster: Distributed memory execution of nested OpenMP-style tasks
Autoři: Aguilar Mena, Jimmy, Ali, Omar Shaaban Ibrahim, Beltran Querol, Vicenç, Carpenter, Paul Matthew, Ayguadé Parra, Eduard, Labarta Mancho, Jesús José
Přispěvatelé: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
Informace o vydavateli: Springer Nature
Rok vydání: 2022
Sbírka: Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge
Témata: Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, Memory management (Computer science), Application program interfaces (Computer software), Data flow analysis, Memory architecture, Open systems, Program translators, Gestió de memòria (Informàtica), Interfícies de programació d'aplicacions (Programari)
Popis: State-of-the-art programming approaches generally have a strict division between intra-node shared memory parallelism and inter-node MPI communication. Tasking with dependencies offers a clean, dependable abstraction for a wide range of hardware and situations within a node, but research on task offloading between nodes is still relatively immature. This paper presents a flexible task offloading extension of the OmpSs-2 programming model, which inherits task ordering from a sequential version of the code and uses a common address space to avoid address translation and simplify the use of data structures with pointers. It uses weak dependencies to enable work to be created concurrently. The program is executed in distributed dataflow fashion, and the runtime system overlaps the construction of the distributed dependency graph, enforces dependencies, transfers data, and schedules tasks for execution. Asynchronous task parallelism avoids synchronization that is often required in MPI+OpenMP tasks. Task scheduling is flexible, and data location is tracked through the dependencies. We wish to enable future work in resiliency, scalability, load balancing and malleability, and therefore release all source code and examples open source. ; This research has received funding from the European Union’s Horizon 2020/EuroHPC research and innovation programme under grant agreement No 955606 (DEEP-SEA) and 754337 (EuroEXA). It is supported by the Spanish State Research Agency - Ministry of Science and Innovation (contract PID2019-107255GB and Ramon y Cajal fellowship RYC2018-025628-I) and by the Generalitat de Catalunya (2017-SGR-1414). ; Peer Reviewed ; Postprint (author's final draft)
Druh dokumentu: conference object
Popis souboru: 16 p.; application/pdf
Jazyk: English
Relation: https://link.springer.com/chapter/10.1007/978-3-031-12597-3_20; info:eu-repo/grantAgreement/EC/H2020/754337/EU/Co-designed Innovation and System for Resilient Exascale Computing in Europe: From Applications to Silicon/EuroEXA; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C22/ES/UPC-COMPUTACION DE ALTAS PRESTACIONES VIII/; https://hdl.handle.net/2117/377512
DOI: 10.1007/978-3-031-12597-3_20
Dostupnost: https://hdl.handle.net/2117/377512
https://doi.org/10.1007/978-3-031-12597-3_20
Rights: Open Access
Přístupové číslo: edsbas.AFF8761
Databáze: BASE
Popis
Abstrakt:State-of-the-art programming approaches generally have a strict division between intra-node shared memory parallelism and inter-node MPI communication. Tasking with dependencies offers a clean, dependable abstraction for a wide range of hardware and situations within a node, but research on task offloading between nodes is still relatively immature. This paper presents a flexible task offloading extension of the OmpSs-2 programming model, which inherits task ordering from a sequential version of the code and uses a common address space to avoid address translation and simplify the use of data structures with pointers. It uses weak dependencies to enable work to be created concurrently. The program is executed in distributed dataflow fashion, and the runtime system overlaps the construction of the distributed dependency graph, enforces dependencies, transfers data, and schedules tasks for execution. Asynchronous task parallelism avoids synchronization that is often required in MPI+OpenMP tasks. Task scheduling is flexible, and data location is tracked through the dependencies. We wish to enable future work in resiliency, scalability, load balancing and malleability, and therefore release all source code and examples open source. ; This research has received funding from the European Union’s Horizon 2020/EuroHPC research and innovation programme under grant agreement No 955606 (DEEP-SEA) and 754337 (EuroEXA). It is supported by the Spanish State Research Agency - Ministry of Science and Innovation (contract PID2019-107255GB and Ramon y Cajal fellowship RYC2018-025628-I) and by the Generalitat de Catalunya (2017-SGR-1414). ; Peer Reviewed ; Postprint (author's final draft)
DOI:10.1007/978-3-031-12597-3_20