Leveraging PaRSEC runtime support to tackle challenging 3D data-sparse matrix problems
Uloženo v:
| Název: | Leveraging PaRSEC runtime support to tackle challenging 3D data-sparse matrix problems |
|---|---|
| Autoři: | Cao, Qinglei, Pei, Yu, Akbudak, Kadir, Bosilca, George, Ltaief, Hatem, Keyes, David E., Dongarra, Jack |
| Přispěvatelé: | Extreme Computing Research Center, Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Applied Mathematics and Computational Science Program, Office of the President, University of Tennessee, Innovative Computing Laboratory, US, ASELSAN Research Center, Turkey, the Oak Ridge National Laboratory, US, University of Manchester, UK |
| Informace o vydavateli: | IEEE |
| Rok vydání: | 2021 |
| Sbírka: | King Abdullah University of Science and Technology: KAUST Repository |
| Témata: | Low-rank matrix computations, Task-based programming model, Dynamic runtime system, Asynchronous executions and load balancing, High-performance computing, User productivity, Environmental applications |
| Popis: | The task-based programming model associated with dynamic runtime systems has gained popularity for challenging problems because of workload imbalance, heterogeneous resources, or extreme concurrency. During the last decade, low-rank matrix approximations - where the main idea consists of exploiting data sparsity, typically by compressing off-diagonal tiles up to an application-specific accuracy threshold - have been adopted to address the curse of dimensionality at extreme scale. In this paper, we create a bridge between the runtime and the linear algebra by communicating knowledge of the data sparsity to the runtime. We design and implement this synergistic approach with high user productivity in mind, in the context of the PaRSEC runtime system and the HiCMA numerical library. This requires extending PaRSEC with new features to integrate rank information into the dataflow so that proper decisions can be made at runtime. We focus on the tile low-rank (TLR) Cholesky factorization for solving 3D data-sparse covariance matrix problems arising in environmental applications. In particular, we employ the 3D exponential model of the Mateŕn matrix kernel, which exhibits challenging nonuniform high ranks in off-diagonal tiles. We first provide dynamic data structure management driven by a performance model to reduce extra floating-point operations. Next, we optimize the memory footprint of the application by relying on a dynamic memory allocator, and supported by a rank-aware data distribution to cope with the workload imbalance. Finally, we expose further parallelism using kernel recursive formulations to shorten the critical path. Our resulting high-performance implementation outperforms existing data-sparse TLR Cholesky factorization by up to 7-fold on a large-scale distributed-memory system, while minimizing the memory footprint up to a 44-fold factor. This multidisciplinary work highlights the need to empower runtime systems beyond their original duty of task scheduling for servicing next-generation low-rank matrix ... |
| Druh dokumentu: | conference object |
| Popis souboru: | application/pdf |
| Jazyk: | unknown |
| ISBN: | 978-1-66544-066-0 1-66544-066-X |
| Relation: | https://ieeexplore.ieee.org/document/9460493/; https://repository.kaust.edu.sa/bitstream/10754/665738/1/ipdps2021-initial-submission.pdf; Cao, Q., Pei, Y., Akbudak, K., Bosilca, G., Ltaief, H., Keyes, D., & Dongarra, J. (2021). Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems. 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). doi:10.1109/ipdps49936.2021.00017; 2-s2.0-85113583277; 79-89; http://hdl.handle.net/10754/665738 |
| DOI: | 10.1109/IPDPS49936.2021.00017 |
| Dostupnost: | http://hdl.handle.net/10754/665738 https://doi.org/10.1109/IPDPS49936.2021.00017 |
| Rights: | Archived with thanks to IEEE |
| Přístupové číslo: | edsbas.C633F165 |
| Databáze: | BASE |
| Abstrakt: | The task-based programming model associated with dynamic runtime systems has gained popularity for challenging problems because of workload imbalance, heterogeneous resources, or extreme concurrency. During the last decade, low-rank matrix approximations - where the main idea consists of exploiting data sparsity, typically by compressing off-diagonal tiles up to an application-specific accuracy threshold - have been adopted to address the curse of dimensionality at extreme scale. In this paper, we create a bridge between the runtime and the linear algebra by communicating knowledge of the data sparsity to the runtime. We design and implement this synergistic approach with high user productivity in mind, in the context of the PaRSEC runtime system and the HiCMA numerical library. This requires extending PaRSEC with new features to integrate rank information into the dataflow so that proper decisions can be made at runtime. We focus on the tile low-rank (TLR) Cholesky factorization for solving 3D data-sparse covariance matrix problems arising in environmental applications. In particular, we employ the 3D exponential model of the Mateŕn matrix kernel, which exhibits challenging nonuniform high ranks in off-diagonal tiles. We first provide dynamic data structure management driven by a performance model to reduce extra floating-point operations. Next, we optimize the memory footprint of the application by relying on a dynamic memory allocator, and supported by a rank-aware data distribution to cope with the workload imbalance. Finally, we expose further parallelism using kernel recursive formulations to shorten the critical path. Our resulting high-performance implementation outperforms existing data-sparse TLR Cholesky factorization by up to 7-fold on a large-scale distributed-memory system, while minimizing the memory footprint up to a 44-fold factor. This multidisciplinary work highlights the need to empower runtime systems beyond their original duty of task scheduling for servicing next-generation low-rank matrix ... |
|---|---|
| ISBN: | 9781665440660 166544066X |
| DOI: | 10.1109/IPDPS49936.2021.00017 |
Full Text Finder
Nájsť tento článok vo Web of Science