A Locality-Based Threading Algorithm for the Configuration-Interaction Method
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we...
Saved in:
| Published in: | 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) pp. 1178 - 1187 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
03.07.2017
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. In this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intel Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node. |
|---|---|
| DOI: | 10.1109/IPDPSW.2017.15 |