A Locality-Based Threading Algorithm for the Configuration-Interaction Method

The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) s. 1178 - 1187
Hlavní autoři: Shan, Hongzhang, Williams, Samuel, Johnson, Calvin, McElvain, Kenneth
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 03.07.2017
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. In this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intel Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.
DOI:10.1109/IPDPSW.2017.15