Hierarchical management of extreme-scale task-based applications

Uloženo v:
Podrobná bibliografie
Název: Hierarchical management of extreme-scale task-based applications
Autoři: Lordan Gomis, Francesc, Puigdemunt Schmolling, Gabriel, Vergés Boncompte, Pere, Conejero Bañón, Francisco Javier, Ejarque Artigas, Jorge, Badia Sala, Rosa Maria
Přispěvatelé: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center
Informace o vydavateli: Springer Cham
Rok vydání: 2023
Sbírka: Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge
Témata: Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, Electronic data processing -- Distributed processing, High performance computing, Distributed systems, Exascale, Task-based, Programming model, Workflow, Hierarchy, Runtime system, Peer-to-peer, Decentralized management, Processament distribuït de dades, Càlcul intensiu (Informàtica)
Popis: The scale and heterogeneity of exascale systems increment the complexity of programming applications exploiting them. Task-based approaches with support for nested tasks are a good-fitting model for them because of the flexibility lying in the task concept. Resembling the hierarchical organization of the hardware, this paper proposes establishing a hierarchy in the application workflow for mapping coarse-grain tasks to the broader hardware components and finer-grain tasks to the lowest levels of the resource hierarchy to benefit from lower-latency and higher-bandwidth communications and exploiting locality. Building on a proposed mechanism to encapsulate within the task the management of its finer-grain parallelism, the paper presents a hierarchical peer-to-peer engine orchestrating the execution of workflow hierarchies with fully-decentralized management. The tests conducted on the MareNostrum 4 supercomputer using a prototype implementation prove the validity of the proposal supporting the execution of up to 707,653 tasks using 2,400 cores and achieving speedups of up to 106 times faster than executions of a single workflow and centralized management. ; This work has been supported by the Spanish Government (PID2019-107255GB), by MCIN/AEI /10.13039/501100011033 (CEX2021-001148-S), by the Departament de Recerca i Universitats de la Generalitat de Catalunya to the Research Group MPiEDist (2021 SGR 00412), and by the European Commission through the Horizon Europe Research and Innovation program under Grant Agreements 101070177 (ICOS project) and 101016577 (AI-Sprint project). ; Peer Reviewed ; Postprint (author's final draft)
Druh dokumentu: conference object
Popis souboru: 14 p.; application/pdf
Jazyk: English
Relation: https://link.springer.com/chapter/10.1007/978-3-031-39698-4_8; info:eu-repo/grantAgreement/EC/HE/101070177/EU/Towards a functional continuum operating system/ICOS; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C21/ES/BSC - COMPUTACION DE ALTAS PRESTACIONES VIII/; http://hdl.handle.net/2117/393224
DOI: 10.1007/978-3-031-39698-4_8
Dostupnost: http://hdl.handle.net/2117/393224
https://doi.org/10.1007/978-3-031-39698-4_8
Rights: Open Access
Přístupové číslo: edsbas.FDAC4E24
Databáze: BASE
Popis
Abstrakt:The scale and heterogeneity of exascale systems increment the complexity of programming applications exploiting them. Task-based approaches with support for nested tasks are a good-fitting model for them because of the flexibility lying in the task concept. Resembling the hierarchical organization of the hardware, this paper proposes establishing a hierarchy in the application workflow for mapping coarse-grain tasks to the broader hardware components and finer-grain tasks to the lowest levels of the resource hierarchy to benefit from lower-latency and higher-bandwidth communications and exploiting locality. Building on a proposed mechanism to encapsulate within the task the management of its finer-grain parallelism, the paper presents a hierarchical peer-to-peer engine orchestrating the execution of workflow hierarchies with fully-decentralized management. The tests conducted on the MareNostrum 4 supercomputer using a prototype implementation prove the validity of the proposal supporting the execution of up to 707,653 tasks using 2,400 cores and achieving speedups of up to 106 times faster than executions of a single workflow and centralized management. ; This work has been supported by the Spanish Government (PID2019-107255GB), by MCIN/AEI /10.13039/501100011033 (CEX2021-001148-S), by the Departament de Recerca i Universitats de la Generalitat de Catalunya to the Research Group MPiEDist (2021 SGR 00412), and by the European Commission through the Horizon Europe Research and Innovation program under Grant Agreements 101070177 (ICOS project) and 101016577 (AI-Sprint project). ; Peer Reviewed ; Postprint (author's final draft)
DOI:10.1007/978-3-031-39698-4_8