A generic interface for parallel cell-based finite element operator application

► Implementation framework for finite element operator application. ► Efficient data structures for high performance, including sum-factorization. ► Hybrid parallelization including MPI, shared memory, and vectorization. ► Operator application reaches up to 70% of system’s peak performance. ► Framew...

Full description

Saved in:

Bibliographic Details
Published in:	Computers & fluids Vol. 63; pp. 135 - 147
Main Authors:	Kronbichler, Martin, Kormann, Katharina
Format:	Journal Article
Language:	English
Published:	Kidlington Elsevier Ltd 30.06.2012 Elsevier
Subjects:	Computation Computational methods in fluid dynamics Dynamical systems Exact sciences and technology Finite element method Finite/spectral element method Fluid dynamics Fundamental areas of phenomenology (including applications) Hybrid parallelization Mathematical analysis Matrix-free method Operators Parallel processing Partial differential equations Physics Source code Sum-factorization Hybrid parallelization Finite/spectral element method Matrix-free method Sum-factorization Finite element method Computational fluid dynamics Refinement method Digital simulation Parallel processing Spectral element method Modelling Adaptive method Mesh generation
ISSN:	0045-7930, 1879-0747, 1879-0747
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	► Implementation framework for finite element operator application. ► Efficient data structures for high performance, including sum-factorization. ► Hybrid parallelization including MPI, shared memory, and vectorization. ► Operator application reaches up to 70% of system’s peak performance. ► Framework outperforms sparse matrix–vector products for element order two and higher. We present a memory-efficient and parallel framework for finite element operator application implemented in the generic open-source library deal.II. Instead of assembling a sparse matrix and using it for matrix–vector products, the operation is applied by cell-wise quadrature. The evaluation of shape functions is implemented with a sum-factorization approach. Our implementation is parallelized on three levels to exploit modern supercomputer architecture in an optimal way: MPI over remote nodes, thread parallelization with dynamic task scheduling within the nodes, and explicit vectorization for utilizing processors’ vector units. Special data structures are designed for high performance and to keep the memory requirements to a minimum. The framework handles adaptively refined meshes and systems of partial differential equations. We provide performance tests for both linear and nonlinear PDEs which show that our cell-based implementation is faster than sparse matrix–vector products for polynomial order two and higher on hexahedral elements and yields ten times higher Gflops rates.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0045-7930 1879-0747 1879-0747
DOI:	10.1016/j.compfluid.2012.04.012