Simultaneously solving swarms of small sparse systems on SIMD silicon

A number of computational science algorithms lead to discretizations that require a large number of independent small matrix solves. Examples include small non-linear coupled chemistry and flow systems, one-dimensional sub-systems in climate and diffusion simulations and semi-implicit time integrato...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) s. 1128 - 1137
Hlavní autori: Lelbach, Bryce Adelstein, Johansen, Hans, Williams, Samuel
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.05.2017
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:A number of computational science algorithms lead to discretizations that require a large number of independent small matrix solves. Examples include small non-linear coupled chemistry and flow systems, one-dimensional sub-systems in climate and diffusion simulations and semi-implicit time integrators, among others. We introduce an approach for solving large quantities of independent banded matrix problems on SIMD architectures. Unlike many vectorized or batched approaches that rely on reusing the matrix factorization across multiple solves, our algorithm supports batches of matrices that differ (due to spatial variation or non-linear solvers, for example). We present an implementation of our approach for diagonally-dominant tridiagonal systems that is optimized via compiler directives, tiling, and choice of data layout. Performance is evaluated on three Intel micro-architectures with different cache, vectorization, and threading features: Intel Ivy Bridge, Haswell, and Knight's Landing. Finally, we show that our solver improves on existing approaches and achieves ~90% of STREAM Triad effective bandwidth on all three platforms.
DOI:10.1109/IPDPSW.2017.114