A Locality-Based Threading Algorithm for the Configuration-Interaction Method
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we...
Uloženo v:
| Vydáno v: | 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) s. 1178 - 1187 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
03.07.2017
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. In this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intel Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node. |
|---|---|
| AbstractList | The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. In this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intel Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node. |
| Author | Shan, Hongzhang McElvain, Kenneth Johnson, Calvin Williams, Samuel |
| Author_xml | – sequence: 1 givenname: Hongzhang surname: Shan fullname: Shan, Hongzhang email: hshan@lbl.gov organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA – sequence: 2 givenname: Samuel surname: Williams fullname: Williams, Samuel email: swwilliams@lbl.gov organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA – sequence: 3 givenname: Calvin surname: Johnson fullname: Johnson, Calvin email: cjohnson@mail.sdsu.edu organization: Dept. of Phys., San Diego State Univ., San Diego, CA, USA – sequence: 4 givenname: Kenneth surname: McElvain fullname: McElvain, Kenneth email: kenmcelvain@me.com organization: Dept. of Phys., Univ. of California, Berkeley, Berkeley, CA, USA |
| BookMark | eNotzLtOwzAUgGEjwQClKwuLXyDBx9dkDOEWKRWVKGKsTHycWEpj5Jqhb48QTP83_VfkfIkLEnIDrARg9V23fdi-fZScgSlBnZF1bSpQotJCsopdkk1D-zjYOeRTcW-P6OhuSmhdWEbazGNMIU8H6mOieULaxsWH8TvZHOJSdEvGZIdf0w3mKbprcuHtfMT1f1fk_elx174U_etz1zZ9EbjkuZCcYe3Q1uAscM2cl7ZSwkhlNaDyg_SI3HivQUn7WSknBm80ajlwL6QTK3L79w2IuP9K4WDTaW9qrcCA-AHQ4kty |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/IPDPSW.2017.15 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781538634080 1538634082 |
| EndPage | 1187 |
| ExternalDocumentID | 7965171 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i242t-420e9dea91da1260df4a853745a61e5fc4fee27ff6154ab85d3cf76e64c2f34d3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000417418900127&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Thu Jun 29 18:38:09 EDT 2023 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i242t-420e9dea91da1260df4a853745a61e5fc4fee27ff6154ab85d3cf76e64c2f34d3 |
| OpenAccessLink | https://escholarship.org/uc/item/9sf515zf |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_7965171 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-07-03 |
| PublicationDateYYYYMMDD | 2017-07-03 |
| PublicationDate_xml | – month: 07 year: 2017 text: 2017-07-03 day: 03 |
| PublicationDecade | 2010 |
| PublicationTitle | 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) |
| PublicationTitleAbbrev | IPDPSW |
| PublicationYear | 2017 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.6333724 |
| Snippet | The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1178 |
| SubjectTerms | bigstick configuration-interaction method hybrid programming model Instruction sets Ivy bridge knights landing locality-based threading algorithm Manycore Memory management MPI multithreading Neutrons OpenMP Partitioning algorithms Protons Sparse matrices |
| Title | A Locality-Based Threading Algorithm for the Configuration-Interaction Method |
| URI | https://ieeexplore.ieee.org/document/7965171 |
| WOSCitedRecordID | wos000417418900127&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG2QeFYDxu_04NHCttvd0iN-EE2AbCJGbqS7nSKJLgRZf7_TLkEPXrw1PbTJ9PDeTOfNI-Ra50JHFiImc8AEBekb07EFZlSC7DZHhDZhzuxQjce96VRnDXKz08IAQGg-g45fhr98uywqXyrrKp0m3AvG95RKa63Wdg4jj3T3KbvPnl99t5bqeJfbX24pASwGB_-75pC0f1R3NNvhyRFpQNkioz4derhBssxuEXEsnWD0Q-M77b_Pl5jbv31QZJ4UmRz1Jy3mVf2qLFT7auECHQWn6DZ5GTxM7h7Z1gKBLRA7N0yKCLQFo7k1HFMP66RBgFUyMSmHxBXSAQjlHBITafJeYuPCqRRSWQgXSxsfk2a5LOGE0MiaNNKQ5Bw5h-TI7MAJ4Qz3plRFLE5Jy4ditqqnXMy2UTj7e_vcJ7i-FcsL9C5Ic7Ou4JLsF1-bxef6KjzNN36nkos |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0ImuhJDRi_7cGjhe1ud0uP-EEgLmQTMXIj3e0USBQMgr_faZegBy_emh7adHp4b9p58wi5UXmoAgMBEzlggoL0janIANMyRnabI0Jr32c2lYNBazRSWYXcbrUwAOCLz6Dhhv4v3yyKtXsqa0qVxNwJxndiIcKgVGttOjHyQDV72UP2_OrqtWTD-dz-8kvxcNE5-N9Gh6T-o7uj2RZRjkgF5jXSb9PUAQ7SZXaHmGPoEOPvS99p-22ywOx--k6Re1LkctStNJusy3tl_r2vlC7QvveKrpOXzuPwvss2Jghshui5YnhOUAa04kZzTD6MFRohVopYJxxiWwgLEEprkZoInbdiExVWJpCIIrSRMNExqc4XczghNDA6CRTEOUfWIThyO7BhaDV3tlRFFJ6SmgvF-KPsczHeROHs7-lrstcd9tNx2hs8nZN9F3RfxhpdkOpquYZLslt8rWafyyt_Td9NMpXY |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2017+IEEE+International+Parallel+and+Distributed+Processing+Symposium+Workshops+%28IPDPSW%29&rft.atitle=A+Locality-Based+Threading+Algorithm+for+the+Configuration-Interaction+Method&rft.au=Shan%2C+Hongzhang&rft.au=Williams%2C+Samuel&rft.au=Johnson%2C+Calvin&rft.au=McElvain%2C+Kenneth&rft.date=2017-07-03&rft.pub=IEEE&rft.spage=1178&rft.epage=1187&rft_id=info:doi/10.1109%2FIPDPSW.2017.15&rft.externalDocID=7965171 |