OmpSs-2@Cluster: Distributed memory execution of nested OpenMP-style tasks
Uloženo v:
| Název: | OmpSs-2@Cluster: Distributed memory execution of nested OpenMP-style tasks |
|---|---|
| Autoři: | Aguilar Mena, Jimmy, Ali, Omar Shaaban Ibrahim, Beltran Querol, Vicenç, Carpenter, Paul Matthew, Ayguadé Parra, Eduard, Labarta Mancho, Jesús José |
| Přispěvatelé: | Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
| Informace o vydavateli: | Springer Nature |
| Rok vydání: | 2022 |
| Sbírka: | Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge |
| Témata: | Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, Memory management (Computer science), Application program interfaces (Computer software), Data flow analysis, Memory architecture, Open systems, Program translators, Gestió de memòria (Informàtica), Interfícies de programació d'aplicacions (Programari) |
| Popis: | State-of-the-art programming approaches generally have a strict division between intra-node shared memory parallelism and inter-node MPI communication. Tasking with dependencies offers a clean, dependable abstraction for a wide range of hardware and situations within a node, but research on task offloading between nodes is still relatively immature. This paper presents a flexible task offloading extension of the OmpSs-2 programming model, which inherits task ordering from a sequential version of the code and uses a common address space to avoid address translation and simplify the use of data structures with pointers. It uses weak dependencies to enable work to be created concurrently. The program is executed in distributed dataflow fashion, and the runtime system overlaps the construction of the distributed dependency graph, enforces dependencies, transfers data, and schedules tasks for execution. Asynchronous task parallelism avoids synchronization that is often required in MPI+OpenMP tasks. Task scheduling is flexible, and data location is tracked through the dependencies. We wish to enable future work in resiliency, scalability, load balancing and malleability, and therefore release all source code and examples open source. ; This research has received funding from the European Union’s Horizon 2020/EuroHPC research and innovation programme under grant agreement No 955606 (DEEP-SEA) and 754337 (EuroEXA). It is supported by the Spanish State Research Agency - Ministry of Science and Innovation (contract PID2019-107255GB and Ramon y Cajal fellowship RYC2018-025628-I) and by the Generalitat de Catalunya (2017-SGR-1414). ; Peer Reviewed ; Postprint (author's final draft) |
| Druh dokumentu: | conference object |
| Popis souboru: | 16 p.; application/pdf |
| Jazyk: | English |
| Relation: | https://link.springer.com/chapter/10.1007/978-3-031-12597-3_20; info:eu-repo/grantAgreement/EC/H2020/754337/EU/Co-designed Innovation and System for Resilient Exascale Computing in Europe: From Applications to Silicon/EuroEXA; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C22/ES/UPC-COMPUTACION DE ALTAS PRESTACIONES VIII/; https://hdl.handle.net/2117/377512 |
| DOI: | 10.1007/978-3-031-12597-3_20 |
| Dostupnost: | https://hdl.handle.net/2117/377512 https://doi.org/10.1007/978-3-031-12597-3_20 |
| Rights: | Open Access |
| Přístupové číslo: | edsbas.AFF8761 |
| Databáze: | BASE |
| Abstrakt: | State-of-the-art programming approaches generally have a strict division between intra-node shared memory parallelism and inter-node MPI communication. Tasking with dependencies offers a clean, dependable abstraction for a wide range of hardware and situations within a node, but research on task offloading between nodes is still relatively immature. This paper presents a flexible task offloading extension of the OmpSs-2 programming model, which inherits task ordering from a sequential version of the code and uses a common address space to avoid address translation and simplify the use of data structures with pointers. It uses weak dependencies to enable work to be created concurrently. The program is executed in distributed dataflow fashion, and the runtime system overlaps the construction of the distributed dependency graph, enforces dependencies, transfers data, and schedules tasks for execution. Asynchronous task parallelism avoids synchronization that is often required in MPI+OpenMP tasks. Task scheduling is flexible, and data location is tracked through the dependencies. We wish to enable future work in resiliency, scalability, load balancing and malleability, and therefore release all source code and examples open source. ; This research has received funding from the European Union’s Horizon 2020/EuroHPC research and innovation programme under grant agreement No 955606 (DEEP-SEA) and 754337 (EuroEXA). It is supported by the Spanish State Research Agency - Ministry of Science and Innovation (contract PID2019-107255GB and Ramon y Cajal fellowship RYC2018-025628-I) and by the Generalitat de Catalunya (2017-SGR-1414). ; Peer Reviewed ; Postprint (author's final draft) |
|---|---|
| DOI: | 10.1007/978-3-031-12597-3_20 |
Nájsť tento článok vo Web of Science