Enhancing OmpSs-2 suspendable tasks by combining operating system and user-level threads with C++ coroutines
Uloženo v:
| Název: | Enhancing OmpSs-2 suspendable tasks by combining operating system and user-level threads with C++ coroutines |
|---|---|
| Autoři: | Cinca Roca, Arnau, Roca Salvado, Albert, Sala Penadés, Kevin, Peñacoba Veigas, Raúl, Álvarez Robert, David, Beltran Querol, Vicenç |
| Přispěvatelé: | Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center |
| Informace o vydavateli: | Institute of Electrical and Electronics Engineers (IEEE) |
| Rok vydání: | 2025 |
| Sbírka: | Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge |
| Témata: | HPC, Task-based programming models, Coroutines, ULTs, OS, Runtime systems, Compilers, OmpSs-2, OpenMP, Cilk |
| Popis: | This paper explores three methods for implementing suspendable tasks within task-based programming models: OS threads (pthreads), User-Level Threads (ULTs), and C++ coroutines. We enhance the OmpSs-2 programming model, originally supporting suspendable tasks via pthreads, to also accommodate ULTs and C++ coroutines. This unified approach facilitates a comprehensive comparative analysis using various benchmarks that includes recursive fork-join and data-flow parallelization strategies. Additionally, we contrast these suspension methods with the Cilk and OpenMP task-based programming models, which, despite their efficiency, lack support for suspendable tasks. Key contributions of this study include the novel integration of C++20 coroutines into the OmpSs-2 programming model, which can be combined with pthreads or ULTs. Furthermore, we introduce a new Linux kernel syscall that accelerates pthread context switches by an order of magnitude, thus narrowing the performance gap between ULTs and pthreads in context-switch times from two to one order of magnitude. C++ Coroutines are the ideal solution for scenarios where many tasks are simultaneously in a suspended state, or the frequency of task suspension and resumption is high because they have the smallest memory footprint and minor contextswitch overhead. However, they are limited to C++ programs, do not support TLS, and only allow task suspension at top-level functions. Still, pthreads and ULTs can bring remarkable benefits where C++ Coroutines cannot. We conclude that combining the strengths of coroutines with pthreads or ULTs brings productivity and performance benefits for programming models. ; This work was partially supported by the Generalitat de Catalunya (contract 2021-SGR-01007) and the Spanish Government through the Severo Ochoa Program (CEX2021-001148-S/MCIN/AEI/10.13039/501100011033). It is also part of the research project PID2023-147979NB-C21, funded by MICIU/AEI (10.13039/501100011033) and co-financed by the European Regional Development Fund (FEDER, ... |
| Druh dokumentu: | conference object |
| Popis souboru: | 14 p.; application/pdf |
| Jazyk: | English |
| Relation: | https://ieeexplore.ieee.org/document/11078500; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2023-147979NB-C21/ES/HERRAMIENTAS SOFTWARE PARA HPC - BSC/; info:eu-repo/grantAgreement/EC/H2020/101034126/EU/Pilot using Independent Local & Open Technologies/The European PILOT; info:eu-repo/grantAgreement/AEI//CEX2021-001148-S; https://hdl.handle.net/2117/451772 |
| DOI: | 10.1109/IPDPS64566.2025.00015 |
| Dostupnost: | https://hdl.handle.net/2117/451772 https://doi.org/10.1109/IPDPS64566.2025.00015 |
| Rights: | Open Access |
| Přístupové číslo: | edsbas.1D8FC88D |
| Databáze: | BASE |
| Abstrakt: | This paper explores three methods for implementing suspendable tasks within task-based programming models: OS threads (pthreads), User-Level Threads (ULTs), and C++ coroutines. We enhance the OmpSs-2 programming model, originally supporting suspendable tasks via pthreads, to also accommodate ULTs and C++ coroutines. This unified approach facilitates a comprehensive comparative analysis using various benchmarks that includes recursive fork-join and data-flow parallelization strategies. Additionally, we contrast these suspension methods with the Cilk and OpenMP task-based programming models, which, despite their efficiency, lack support for suspendable tasks. Key contributions of this study include the novel integration of C++20 coroutines into the OmpSs-2 programming model, which can be combined with pthreads or ULTs. Furthermore, we introduce a new Linux kernel syscall that accelerates pthread context switches by an order of magnitude, thus narrowing the performance gap between ULTs and pthreads in context-switch times from two to one order of magnitude. C++ Coroutines are the ideal solution for scenarios where many tasks are simultaneously in a suspended state, or the frequency of task suspension and resumption is high because they have the smallest memory footprint and minor contextswitch overhead. However, they are limited to C++ programs, do not support TLS, and only allow task suspension at top-level functions. Still, pthreads and ULTs can bring remarkable benefits where C++ Coroutines cannot. We conclude that combining the strengths of coroutines with pthreads or ULTs brings productivity and performance benefits for programming models. ; This work was partially supported by the Generalitat de Catalunya (contract 2021-SGR-01007) and the Spanish Government through the Severo Ochoa Program (CEX2021-001148-S/MCIN/AEI/10.13039/501100011033). It is also part of the research project PID2023-147979NB-C21, funded by MICIU/AEI (10.13039/501100011033) and co-financed by the European Regional Development Fund (FEDER, ... |
|---|---|
| DOI: | 10.1109/IPDPS64566.2025.00015 |
Nájsť tento článok vo Web of Science