A Locality-Based Threading Algorithm for the Configuration-Interaction Method

The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) s. 1178 - 1187
Hlavní autoři: Shan, Hongzhang, Williams, Samuel, Johnson, Calvin, McElvain, Kenneth
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 03.07.2017
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. In this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intel Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.
AbstractList The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. In this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intel Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.
Author Shan, Hongzhang
McElvain, Kenneth
Johnson, Calvin
Williams, Samuel
Author_xml – sequence: 1
  givenname: Hongzhang
  surname: Shan
  fullname: Shan, Hongzhang
  email: hshan@lbl.gov
  organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
– sequence: 2
  givenname: Samuel
  surname: Williams
  fullname: Williams, Samuel
  email: swwilliams@lbl.gov
  organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
– sequence: 3
  givenname: Calvin
  surname: Johnson
  fullname: Johnson, Calvin
  email: cjohnson@mail.sdsu.edu
  organization: Dept. of Phys., San Diego State Univ., San Diego, CA, USA
– sequence: 4
  givenname: Kenneth
  surname: McElvain
  fullname: McElvain, Kenneth
  email: kenmcelvain@me.com
  organization: Dept. of Phys., Univ. of California, Berkeley, Berkeley, CA, USA
BookMark eNotzLtOwzAUgGEjwQClKwuLXyDBx9dkDOEWKRWVKGKsTHycWEpj5Jqhb48QTP83_VfkfIkLEnIDrARg9V23fdi-fZScgSlBnZF1bSpQotJCsopdkk1D-zjYOeRTcW-P6OhuSmhdWEbazGNMIU8H6mOieULaxsWH8TvZHOJSdEvGZIdf0w3mKbprcuHtfMT1f1fk_elx174U_etz1zZ9EbjkuZCcYe3Q1uAscM2cl7ZSwkhlNaDyg_SI3HivQUn7WSknBm80ajlwL6QTK3L79w2IuP9K4WDTaW9qrcCA-AHQ4kty
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IPDPSW.2017.15
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781538634080
1538634082
EndPage 1187
ExternalDocumentID 7965171
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i242t-420e9dea91da1260df4a853745a61e5fc4fee27ff6154ab85d3cf76e64c2f34d3
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000417418900127&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Thu Jun 29 18:38:09 EDT 2023
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i242t-420e9dea91da1260df4a853745a61e5fc4fee27ff6154ab85d3cf76e64c2f34d3
OpenAccessLink https://escholarship.org/uc/item/9sf515zf
PageCount 10
ParticipantIDs ieee_primary_7965171
PublicationCentury 2000
PublicationDate 2017-07-03
PublicationDateYYYYMMDD 2017-07-03
PublicationDate_xml – month: 07
  year: 2017
  text: 2017-07-03
  day: 03
PublicationDecade 2010
PublicationTitle 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
PublicationTitleAbbrev IPDPSW
PublicationYear 2017
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.6333724
Snippet The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to...
SourceID ieee
SourceType Publisher
StartPage 1178
SubjectTerms bigstick
configuration-interaction method
hybrid programming model
Instruction sets
Ivy bridge
knights landing
locality-based threading algorithm
Manycore
Memory management
MPI
multithreading
Neutrons
OpenMP
Partitioning algorithms
Protons
Sparse matrices
Title A Locality-Based Threading Algorithm for the Configuration-Interaction Method
URI https://ieeexplore.ieee.org/document/7965171
WOSCitedRecordID wos000417418900127&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG2QeFYDxu_04NHCttvd0iN-EE2AbCJGbqS7nSKJLgRZf7_TLkEPXrw1PbTJ9PDeTOfNI-Ra50JHFiImc8AEBekb07EFZlSC7DZHhDZhzuxQjce96VRnDXKz08IAQGg-g45fhr98uywqXyrrKp0m3AvG95RKa63Wdg4jj3T3KbvPnl99t5bqeJfbX24pASwGB_-75pC0f1R3NNvhyRFpQNkioz4derhBssxuEXEsnWD0Q-M77b_Pl5jbv31QZJ4UmRz1Jy3mVf2qLFT7auECHQWn6DZ5GTxM7h7Z1gKBLRA7N0yKCLQFo7k1HFMP66RBgFUyMSmHxBXSAQjlHBITafJeYuPCqRRSWQgXSxsfk2a5LOGE0MiaNNKQ5Bw5h-TI7MAJ4Qz3plRFLE5Jy4ditqqnXMy2UTj7e_vcJ7i-FcsL9C5Ic7Ou4JLsF1-bxef6KjzNN36nkos
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0ImuhJDRi_7cGjhe1ud0uP-EEgLmQTMXIj3e0USBQMgr_faZegBy_emh7adHp4b9p58wi5UXmoAgMBEzlggoL0janIANMyRnabI0Jr32c2lYNBazRSWYXcbrUwAOCLz6Dhhv4v3yyKtXsqa0qVxNwJxndiIcKgVGttOjHyQDV72UP2_OrqtWTD-dz-8kvxcNE5-N9Gh6T-o7uj2RZRjkgF5jXSb9PUAQ7SZXaHmGPoEOPvS99p-22ywOx--k6Re1LkctStNJusy3tl_r2vlC7QvveKrpOXzuPwvss2Jghshui5YnhOUAa04kZzTD6MFRohVopYJxxiWwgLEEprkZoInbdiExVWJpCIIrSRMNExqc4XczghNDA6CRTEOUfWIThyO7BhaDV3tlRFFJ6SmgvF-KPsczHeROHs7-lrstcd9tNx2hs8nZN9F3RfxhpdkOpquYZLslt8rWafyyt_Td9NMpXY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2017+IEEE+International+Parallel+and+Distributed+Processing+Symposium+Workshops+%28IPDPSW%29&rft.atitle=A+Locality-Based+Threading+Algorithm+for+the+Configuration-Interaction+Method&rft.au=Shan%2C+Hongzhang&rft.au=Williams%2C+Samuel&rft.au=Johnson%2C+Calvin&rft.au=McElvain%2C+Kenneth&rft.date=2017-07-03&rft.pub=IEEE&rft.spage=1178&rft.epage=1187&rft_id=info:doi/10.1109%2FIPDPSW.2017.15&rft.externalDocID=7965171