A Locality-Based Threading Algorithm for the Configuration-Interaction Method

The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we...

Full description

Saved in:
Bibliographic Details
Published in:2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) pp. 1178 - 1187
Main Authors: Shan, Hongzhang, Williams, Samuel, Johnson, Calvin, McElvain, Kenneth
Format: Conference Proceeding
Language:English
Published: IEEE 03.07.2017
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. In this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intel Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.
DOI:10.1109/IPDPSW.2017.15