Enhancing OmpSs-2 suspendable tasks by combining operating system and user-level threads with C++ coroutines

Uloženo v:
Podrobná bibliografie
Název: Enhancing OmpSs-2 suspendable tasks by combining operating system and user-level threads with C++ coroutines
Autoři: Cinca Roca, Arnau, Roca Salvado, Albert, Sala Penadés, Kevin, Peñacoba Veigas, Raúl, Álvarez Robert, David, Beltran Querol, Vicenç
Přispěvatelé: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center
Informace o vydavateli: Institute of Electrical and Electronics Engineers (IEEE)
Rok vydání: 2025
Sbírka: Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge
Témata: HPC, Task-based programming models, Coroutines, ULTs, OS, Runtime systems, Compilers, OmpSs-2, OpenMP, Cilk
Popis: This paper explores three methods for implementing suspendable tasks within task-based programming models: OS threads (pthreads), User-Level Threads (ULTs), and C++ coroutines. We enhance the OmpSs-2 programming model, originally supporting suspendable tasks via pthreads, to also accommodate ULTs and C++ coroutines. This unified approach facilitates a comprehensive comparative analysis using various benchmarks that includes recursive fork-join and data-flow parallelization strategies. Additionally, we contrast these suspension methods with the Cilk and OpenMP task-based programming models, which, despite their efficiency, lack support for suspendable tasks. Key contributions of this study include the novel integration of C++20 coroutines into the OmpSs-2 programming model, which can be combined with pthreads or ULTs. Furthermore, we introduce a new Linux kernel syscall that accelerates pthread context switches by an order of magnitude, thus narrowing the performance gap between ULTs and pthreads in context-switch times from two to one order of magnitude. C++ Coroutines are the ideal solution for scenarios where many tasks are simultaneously in a suspended state, or the frequency of task suspension and resumption is high because they have the smallest memory footprint and minor contextswitch overhead. However, they are limited to C++ programs, do not support TLS, and only allow task suspension at top-level functions. Still, pthreads and ULTs can bring remarkable benefits where C++ Coroutines cannot. We conclude that combining the strengths of coroutines with pthreads or ULTs brings productivity and performance benefits for programming models. ; This work was partially supported by the Generalitat de Catalunya (contract 2021-SGR-01007) and the Spanish Government through the Severo Ochoa Program (CEX2021-001148-S/MCIN/AEI/10.13039/501100011033). It is also part of the research project PID2023-147979NB-C21, funded by MICIU/AEI (10.13039/501100011033) and co-financed by the European Regional Development Fund (FEDER, ...
Druh dokumentu: conference object
Popis souboru: 14 p.; application/pdf
Jazyk: English
Relation: https://ieeexplore.ieee.org/document/11078500; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2023-147979NB-C21/ES/HERRAMIENTAS SOFTWARE PARA HPC - BSC/; info:eu-repo/grantAgreement/EC/H2020/101034126/EU/Pilot using Independent Local & Open Technologies/The European PILOT; info:eu-repo/grantAgreement/AEI//CEX2021-001148-S; https://hdl.handle.net/2117/451772
DOI: 10.1109/IPDPS64566.2025.00015
Dostupnost: https://hdl.handle.net/2117/451772
https://doi.org/10.1109/IPDPS64566.2025.00015
Rights: Open Access
Přístupové číslo: edsbas.1D8FC88D
Databáze: BASE
Popis
Abstrakt:This paper explores three methods for implementing suspendable tasks within task-based programming models: OS threads (pthreads), User-Level Threads (ULTs), and C++ coroutines. We enhance the OmpSs-2 programming model, originally supporting suspendable tasks via pthreads, to also accommodate ULTs and C++ coroutines. This unified approach facilitates a comprehensive comparative analysis using various benchmarks that includes recursive fork-join and data-flow parallelization strategies. Additionally, we contrast these suspension methods with the Cilk and OpenMP task-based programming models, which, despite their efficiency, lack support for suspendable tasks. Key contributions of this study include the novel integration of C++20 coroutines into the OmpSs-2 programming model, which can be combined with pthreads or ULTs. Furthermore, we introduce a new Linux kernel syscall that accelerates pthread context switches by an order of magnitude, thus narrowing the performance gap between ULTs and pthreads in context-switch times from two to one order of magnitude. C++ Coroutines are the ideal solution for scenarios where many tasks are simultaneously in a suspended state, or the frequency of task suspension and resumption is high because they have the smallest memory footprint and minor contextswitch overhead. However, they are limited to C++ programs, do not support TLS, and only allow task suspension at top-level functions. Still, pthreads and ULTs can bring remarkable benefits where C++ Coroutines cannot. We conclude that combining the strengths of coroutines with pthreads or ULTs brings productivity and performance benefits for programming models. ; This work was partially supported by the Generalitat de Catalunya (contract 2021-SGR-01007) and the Spanish Government through the Severo Ochoa Program (CEX2021-001148-S/MCIN/AEI/10.13039/501100011033). It is also part of the research project PID2023-147979NB-C21, funded by MICIU/AEI (10.13039/501100011033) and co-financed by the European Regional Development Fund (FEDER, ...
DOI:10.1109/IPDPS64566.2025.00015