A generic interface for parallel cell-based finite element operator application

► Implementation framework for finite element operator application. ► Efficient data structures for high performance, including sum-factorization. ► Hybrid parallelization including MPI, shared memory, and vectorization. ► Operator application reaches up to 70% of system’s peak performance. ► Framew...

Full description

Saved in:
Bibliographic Details
Published in:Computers & fluids Vol. 63; pp. 135 - 147
Main Authors: Kronbichler, Martin, Kormann, Katharina
Format: Journal Article
Language:English
Published: Kidlington Elsevier Ltd 30.06.2012
Elsevier
Subjects:
ISSN:0045-7930, 1879-0747, 1879-0747
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:► Implementation framework for finite element operator application. ► Efficient data structures for high performance, including sum-factorization. ► Hybrid parallelization including MPI, shared memory, and vectorization. ► Operator application reaches up to 70% of system’s peak performance. ► Framework outperforms sparse matrix–vector products for element order two and higher. We present a memory-efficient and parallel framework for finite element operator application implemented in the generic open-source library deal.II. Instead of assembling a sparse matrix and using it for matrix–vector products, the operation is applied by cell-wise quadrature. The evaluation of shape functions is implemented with a sum-factorization approach. Our implementation is parallelized on three levels to exploit modern supercomputer architecture in an optimal way: MPI over remote nodes, thread parallelization with dynamic task scheduling within the nodes, and explicit vectorization for utilizing processors’ vector units. Special data structures are designed for high performance and to keep the memory requirements to a minimum. The framework handles adaptively refined meshes and systems of partial differential equations. We provide performance tests for both linear and nonlinear PDEs which show that our cell-based implementation is faster than sparse matrix–vector products for polynomial order two and higher on hexahedral elements and yields ten times higher Gflops rates.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0045-7930
1879-0747
1879-0747
DOI:10.1016/j.compfluid.2012.04.012