An Efficient RI-MP2 Algorithm for Distributed Many-GPU Architectures
Second-order Møller-Plesset perturbation theory (MP2) using the Resolution of the Identity approximation (RI-MP2) is a widely used method for computing molecular energies beyond the Hartree-Fock mean-field approximation. However, its high computational cost and lack of efficient algorithms for moder...
Uloženo v:
| Vydáno v: | Journal of chemical theory and computation Ročník 20; číslo 21; s. 9394 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
United States
12.11.2024
|
| ISSN: | 1549-9626, 1549-9626 |
| On-line přístup: | Zjistit podrobnosti o přístupu |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Second-order Møller-Plesset perturbation theory (MP2) using the Resolution of the Identity approximation (RI-MP2) is a widely used method for computing molecular energies beyond the Hartree-Fock mean-field approximation. However, its high computational cost and lack of efficient algorithms for modern supercomputing architectures limit its applicability to large molecules. In this paper, we present the first distributed-memory many-GPU RI-MP2 algorithm explicitly designed to utilize hundreds of GPU accelerators for every step of the computation. Our novel algorithm achieves near-peak performance on GPU-based supercomputers through the development of a distributed memory algorithm for forming RI-MP2 intermediate tensors with zero internode communication, except for a single
asynchronous broadcast, and a distributed memory algorithm for the
energy reduction step, capable of sustaining near-peak performance on clusters with several hundred GPUs. Comparative analysis shows our implementation outperforms state-of-the-art quantum chemistry software by over 3.5 times in speed while achieving an 8-fold reduction in computational power consumption. Benchmarking on the Perlmutter supercomputer, our algorithm achieves 11.8 PFLOP/s (83% of peak performance) performing and the RI-MP2 energy calculation on a 314-water cluster with 7850 primary and 30,144 auxiliary basis functions in 4 min on 180 nodes and 720 A100 GPUs. This performance represents a substantial improvement over traditional CPU-based methods, demonstrating significant time-to-solution and power consumption benefits of leveraging modern GPU-accelerated computing environments for quantum chemistry calculations. |
|---|---|
| AbstractList | Second-order Møller-Plesset perturbation theory (MP2) using the Resolution of the Identity approximation (RI-MP2) is a widely used method for computing molecular energies beyond the Hartree-Fock mean-field approximation. However, its high computational cost and lack of efficient algorithms for modern supercomputing architectures limit its applicability to large molecules. In this paper, we present the first distributed-memory many-GPU RI-MP2 algorithm explicitly designed to utilize hundreds of GPU accelerators for every step of the computation. Our novel algorithm achieves near-peak performance on GPU-based supercomputers through the development of a distributed memory algorithm for forming RI-MP2 intermediate tensors with zero internode communication, except for a single
asynchronous broadcast, and a distributed memory algorithm for the
energy reduction step, capable of sustaining near-peak performance on clusters with several hundred GPUs. Comparative analysis shows our implementation outperforms state-of-the-art quantum chemistry software by over 3.5 times in speed while achieving an 8-fold reduction in computational power consumption. Benchmarking on the Perlmutter supercomputer, our algorithm achieves 11.8 PFLOP/s (83% of peak performance) performing and the RI-MP2 energy calculation on a 314-water cluster with 7850 primary and 30,144 auxiliary basis functions in 4 min on 180 nodes and 720 A100 GPUs. This performance represents a substantial improvement over traditional CPU-based methods, demonstrating significant time-to-solution and power consumption benefits of leveraging modern GPU-accelerated computing environments for quantum chemistry calculations. Second-order Møller-Plesset perturbation theory (MP2) using the Resolution of the Identity approximation (RI-MP2) is a widely used method for computing molecular energies beyond the Hartree-Fock mean-field approximation. However, its high computational cost and lack of efficient algorithms for modern supercomputing architectures limit its applicability to large molecules. In this paper, we present the first distributed-memory many-GPU RI-MP2 algorithm explicitly designed to utilize hundreds of GPU accelerators for every step of the computation. Our novel algorithm achieves near-peak performance on GPU-based supercomputers through the development of a distributed memory algorithm for forming RI-MP2 intermediate tensors with zero internode communication, except for a single O(N2) asynchronous broadcast, and a distributed memory algorithm for the O(N5) energy reduction step, capable of sustaining near-peak performance on clusters with several hundred GPUs. Comparative analysis shows our implementation outperforms state-of-the-art quantum chemistry software by over 3.5 times in speed while achieving an 8-fold reduction in computational power consumption. Benchmarking on the Perlmutter supercomputer, our algorithm achieves 11.8 PFLOP/s (83% of peak performance) performing and the RI-MP2 energy calculation on a 314-water cluster with 7850 primary and 30,144 auxiliary basis functions in 4 min on 180 nodes and 720 A100 GPUs. This performance represents a substantial improvement over traditional CPU-based methods, demonstrating significant time-to-solution and power consumption benefits of leveraging modern GPU-accelerated computing environments for quantum chemistry calculations.Second-order Møller-Plesset perturbation theory (MP2) using the Resolution of the Identity approximation (RI-MP2) is a widely used method for computing molecular energies beyond the Hartree-Fock mean-field approximation. However, its high computational cost and lack of efficient algorithms for modern supercomputing architectures limit its applicability to large molecules. In this paper, we present the first distributed-memory many-GPU RI-MP2 algorithm explicitly designed to utilize hundreds of GPU accelerators for every step of the computation. Our novel algorithm achieves near-peak performance on GPU-based supercomputers through the development of a distributed memory algorithm for forming RI-MP2 intermediate tensors with zero internode communication, except for a single O(N2) asynchronous broadcast, and a distributed memory algorithm for the O(N5) energy reduction step, capable of sustaining near-peak performance on clusters with several hundred GPUs. Comparative analysis shows our implementation outperforms state-of-the-art quantum chemistry software by over 3.5 times in speed while achieving an 8-fold reduction in computational power consumption. Benchmarking on the Perlmutter supercomputer, our algorithm achieves 11.8 PFLOP/s (83% of peak performance) performing and the RI-MP2 energy calculation on a 314-water cluster with 7850 primary and 30,144 auxiliary basis functions in 4 min on 180 nodes and 720 A100 GPUs. This performance represents a substantial improvement over traditional CPU-based methods, demonstrating significant time-to-solution and power consumption benefits of leveraging modern GPU-accelerated computing environments for quantum chemistry calculations. |
| Author | Snowdon, Calum Barca, Giuseppe M J |
| Author_xml | – sequence: 1 givenname: Calum surname: Snowdon fullname: Snowdon, Calum organization: School of Computing, Australian National University, Canberra 2600, Australia – sequence: 2 givenname: Giuseppe M J orcidid: 0000-0001-5109-4279 surname: Barca fullname: Barca, Giuseppe M J organization: School of Computing and Information Systems, University of Melbourne, Melbourne 3010, Australia |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/39422609$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNjztPwzAYRS0Eog_YmVBGlhS_4sRj1JZSqRUVonNkf7GpqzyK7Qz99yBRJKZzh6MrnQm67vrOIPRA8IxgSp4VhNkRIsw4YFwQfoXGJOMylYKK6397hCYhHDFmjFN2i0ZMckoFlmO0KLtkaa0DZ7qYvK_T7Y4mZfPZexcPbWJ7nyxciN7pIZo62arunK52-6T0cHDRQBy8CXfoxqommPsLp2j_svyYv6abt9V6Xm5SxUgeUw0Fl6ZWOcuFEIxYhgnTwAEszzhwzqnRNiuEthbqIqsVCMGZ1LXmmRaSTtHT7-_J91-DCbFqXQDTNKoz_RAqRkguZcEE_VEfL-qgW1NXJ-9a5c_VXzn9BuLAXLI |
| ContentType | Journal Article |
| DBID | NPM 7X8 |
| DOI | 10.1021/acs.jctc.4c00814 |
| DatabaseName | PubMed MEDLINE - Academic |
| DatabaseTitle | PubMed MEDLINE - Academic |
| DatabaseTitleList | PubMed MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Chemistry |
| EISSN | 1549-9626 |
| ExternalDocumentID | 39422609 |
| Genre | Journal Article |
| GroupedDBID | 4.4 53G 55A 5GY 5VS 7~N AABXI ABBLG ABJNI ABLBI ABMVS ABQRX ABUCX ACGFS ACIWK ACS ADHLV AEESW AENEX AFEFF AHGAQ ALMA_UNASSIGNED_HOLDINGS AQSVZ BAANH CS3 CUPRZ D0L DU5 EBS ED~ F5P GGK GNL IH9 J9A JG~ NPM P2P RNS ROL UI2 VF5 VG9 W1F 7X8 |
| ID | FETCH-LOGICAL-a317t-bc849eda73766631f3013bc4ccf454c4442ebf586bffcd85dac66439bdb45b692 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001338407600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1549-9626 |
| IngestDate | Fri Jul 11 12:26:25 EDT 2025 Mon Jul 21 05:54:48 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 21 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a317t-bc849eda73766631f3013bc4ccf454c4442ebf586bffcd85dac66439bdb45b692 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0001-5109-4279 |
| PMID | 39422609 |
| PQID | 3117998362 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_3117998362 pubmed_primary_39422609 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-11-12 |
| PublicationDateYYYYMMDD | 2024-11-12 |
| PublicationDate_xml | – month: 11 year: 2024 text: 2024-11-12 day: 12 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Journal of chemical theory and computation |
| PublicationTitleAlternate | J Chem Theory Comput |
| PublicationYear | 2024 |
| SSID | ssj0033423 |
| Score | 2.4496715 |
| Snippet | Second-order Møller-Plesset perturbation theory (MP2) using the Resolution of the Identity approximation (RI-MP2) is a widely used method for computing... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 9394 |
| Title | An Efficient RI-MP2 Algorithm for Distributed Many-GPU Architectures |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/39422609 https://www.proquest.com/docview/3117998362 |
| Volume | 20 |
| WOSCitedRecordID | wos001338407600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3JTsMwELWAIsGFfSmbjMTVLXEmi08oailwaFUhKvVW2Y7NIkgLCXw_4zQBLkhIXHJyong0nnn2S94j5EwpTypfgCPdgWHHs0xxETFjFBjgXiqsLM0mosEgHo_FsDpwy6vPKuuaWBbqdKrdGXnbL7XLYqy3F7NX5lyjHLtaWWgskoaPUMZldTT-YhF8p25X6qWCU6HkNU2Jba0tdd560oVugXZdEX4HmGWj6a3_9xU3yFoFMWkyz4lNsmCyLbLSqZ3dtkk3yehlKR2BHYfe3rD-kNPk-R6fVTy8UISxtOuGOissk9I-1gt2NRzR5AfpkO-QUe_yrnPNKjcFJhEjFEzpGIRJZYQlBWGGZ3Fp-0qD1hYC0ADAjbJBHCprdRoHqdShgysqVRCoUPBdspRNM7NPqNu0RGBCIwSA4lJoK4NQi0B5EW5Y0iY5rQM0wak5CkJmZvqeT75D1CR78yhPZnNZjQlmDGLBc3Hwh7sPySpHdOF-CvT4EWlYXKvmmCzrj-Ixfzsp0wCvg2H_EzlZvEk |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Efficient+RI-MP2+Algorithm+for+Distributed+Many-GPU+Architectures&rft.jtitle=Journal+of+chemical+theory+and+computation&rft.au=Snowdon%2C+Calum&rft.au=Barca%2C+Giuseppe+M+J&rft.date=2024-11-12&rft.issn=1549-9626&rft.eissn=1549-9626&rft.volume=20&rft.issue=21&rft.spage=9394&rft_id=info:doi/10.1021%2Facs.jctc.4c00814&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1549-9626&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1549-9626&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1549-9626&client=summon |