Simultaneously solving swarms of small sparse systems on SIMD silicon
A number of computational science algorithms lead to discretizations that require a large number of independent small matrix solves. Examples include small non-linear coupled chemistry and flow systems, one-dimensional sub-systems in climate and diffusion simulations and semi-implicit time integrato...
Saved in:
| Published in: | 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) pp. 1128 - 1137 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.05.2017
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | A number of computational science algorithms lead to discretizations that require a large number of independent small matrix solves. Examples include small non-linear coupled chemistry and flow systems, one-dimensional sub-systems in climate and diffusion simulations and semi-implicit time integrators, among others. We introduce an approach for solving large quantities of independent banded matrix problems on SIMD architectures. Unlike many vectorized or batched approaches that rely on reusing the matrix factorization across multiple solves, our algorithm supports batches of matrices that differ (due to spatial variation or non-linear solvers, for example). We present an implementation of our approach for diagonally-dominant tridiagonal systems that is optimized via compiler directives, tiling, and choice of data layout. Performance is evaluated on three Intel micro-architectures with different cache, vectorization, and threading features: Intel Ivy Bridge, Haswell, and Knight's Landing. Finally, we show that our solver improves on existing approaches and achieves ~90% of STREAM Triad effective bandwidth on all three platforms. |
|---|---|
| DOI: | 10.1109/IPDPSW.2017.114 |