A generic interface for parallel cell-based finite element operator application

► Implementation framework for finite element operator application. ► Efficient data structures for high performance, including sum-factorization. ► Hybrid parallelization including MPI, shared memory, and vectorization. ► Operator application reaches up to 70% of system’s peak performance. ► Framew...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers & fluids Jg. 63; S. 135 - 147
Hauptverfasser: Kronbichler, Martin, Kormann, Katharina
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Kidlington Elsevier Ltd 30.06.2012
Elsevier
Schlagworte:
ISSN:0045-7930, 1879-0747, 1879-0747
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:► Implementation framework for finite element operator application. ► Efficient data structures for high performance, including sum-factorization. ► Hybrid parallelization including MPI, shared memory, and vectorization. ► Operator application reaches up to 70% of system’s peak performance. ► Framework outperforms sparse matrix–vector products for element order two and higher. We present a memory-efficient and parallel framework for finite element operator application implemented in the generic open-source library deal.II. Instead of assembling a sparse matrix and using it for matrix–vector products, the operation is applied by cell-wise quadrature. The evaluation of shape functions is implemented with a sum-factorization approach. Our implementation is parallelized on three levels to exploit modern supercomputer architecture in an optimal way: MPI over remote nodes, thread parallelization with dynamic task scheduling within the nodes, and explicit vectorization for utilizing processors’ vector units. Special data structures are designed for high performance and to keep the memory requirements to a minimum. The framework handles adaptively refined meshes and systems of partial differential equations. We provide performance tests for both linear and nonlinear PDEs which show that our cell-based implementation is faster than sparse matrix–vector products for polynomial order two and higher on hexahedral elements and yields ten times higher Gflops rates.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0045-7930
1879-0747
1879-0747
DOI:10.1016/j.compfluid.2012.04.012