Massively parallel algorithm and implementation of RI-MP2 energy calculation for peta-scale many-core supercomputers

A new parallel algorithm and its implementation for the RI‐MP2 energy calculation utilizing peta‐flop‐class many‐core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual‐level hierarchical parallelization...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of computational chemistry Ročník 37; číslo 30; s. 2623 - 2633
Hlavní autori: Katouda, Michio, Naruse, Akira, Hirano, Yukihiko, Nakajima, Takahito
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States Blackwell Publishing Ltd 15.11.2016
Wiley Subscription Services, Inc
Predmet:
ISSN:0192-8651, 1096-987X, 1096-987X
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:A new parallel algorithm and its implementation for the RI‐MP2 energy calculation utilizing peta‐flop‐class many‐core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual‐level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi‐node and multi‐GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi‐node and multi‐GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. A new parallel algorithm and its implementation for the RI‐MP2 energy calculation utilizing peta‐flop‐class many‐core supercomputers are presented. In this algorithm, (1) a dual‐level hierarchical parallelization scheme that enables the use of more than 10,000 MPI processes and (2) a new data communication scheme that reduces network communication overhead are applied. Benchmark calculations using the new implementation on the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high parallel efficiency.
Bibliografia:ArticleID:JCC24491
Next-Generation Supercomputer project (the K computer project) by MEXT, Japan
TSUBAME grand challenge program, category A by Tokyo Institute of Technology
ark:/67375/WNG-C2W2TB8R-Z
istex:2E001E55C6A54F0B72DC68896DA63DCAB7383383
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:0192-8651
1096-987X
1096-987X
DOI:10.1002/jcc.24491