A distributed-memory hierarchical solver for general sparse linear systems
•Derived a new formulation of a sequential hierarchical solver, which compresses dense fill-in blocks.•Proposed a new parallel algorithm for solving general sparse linear systems based on data decomposition.•Implemented a task-based asynchronous scheme by exploiting data dependency in our algorithm....
Uložené v:
| Vydané v: | Parallel computing Ročník 74; číslo C; s. 49 - 64 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
United States
Elsevier B.V
01.05.2018
Elsevier |
| Predmet: | |
| ISSN: | 0167-8191, 1872-7336 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | •Derived a new formulation of a sequential hierarchical solver, which compresses dense fill-in blocks.•Proposed a new parallel algorithm for solving general sparse linear systems based on data decomposition.•Implemented a task-based asynchronous scheme by exploiting data dependency in our algorithm.•Implemented a coloring scheme to extract concurrency in the execution.•Provided benchmarks for various problems and analysis of parallel scalability under different conditions.
We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by every processor. We present various numerical results to demonstrate the versatility and scalability of the parallel algorithm. |
|---|---|
| Bibliografia: | AC04-94AL85000; NA0002373-1; AC02-05CH11231; NA-0003525 USDOE Office of Science (SC) USDOE National Nuclear Security Administration (NNSA) Stanford Univ., CA (United States) SAND2017-0977J |
| ISSN: | 0167-8191 1872-7336 |
| DOI: | 10.1016/j.parco.2017.12.004 |