Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning
We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock speeds. In this work, we demonstrate a hierarchical approach towards effectively extracting performance for a variety of emerging multicore-b...
Uložené v:
| Vydané v: | 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) s. 1 - 12 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
New York, NY, USA
ACM
12.11.2011
IEEE |
| Edícia: | ACM Conferences |
| Predmet: | |
| ISBN: | 145030771X, 9781450307710 |
| ISSN: | 2167-4329 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock speeds. In this work, we demonstrate a hierarchical approach towards effectively extracting performance for a variety of emerging multicore-based supercomputing platforms. Our examined application is a structured grid-based Lattice Boltzmann computation that simulates homogeneous isotropic turbulence in magnetohydrodynamics. First, we examine sophisticated sequential auto-tuning techniques including loop transformations, virtual vectorization, and use of ISA-specific intrinsics. Next, we present a variety of parallel optimization approaches including programming model exploration (flat MPI, MPI/OpenMP, and MPI/Pthreads), as well as data and thread decomposition strategies designed to mitigate communication bottlenecks. Finally, we evaluate the impact of our hierarchical tuning techniques using a variety of problem sizes via large-scale simulations on state-of-the-art Cray XT4, Cray XE6, and IBM BlueGene/P platforms. Results show that our unique tuning approach improves performance and energy requirements by up to 3.4x using 49,152 cores, while providing a portable optimization methodology for a variety of numerical methods on forthcoming HPC systems. |
|---|---|
| AbstractList | We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock speeds. In this work, we demonstrate a hierarchical approach towards effectively extracting performance for a variety of emerging multicore-based supercomputing platforms. Our examined application is a structured grid-based Lattice Boltzmann computation that simulates homogeneous isotropic turbulence in magnetohydrodynamics. First, we examine sophisticated sequential auto-tuning techniques including loop transformations, virtual vectorization, and use of ISA-specific intrinsics. Next, we present a variety of parallel optimization approaches including programming model exploration (flat MPI, MPI/OpenMP, and MPI/Pthreads), as well as data and thread decomposition strategies designed to mitigate communication bottlenecks. Finally, we evaluate the impact of our hierarchical tuning techniques using a variety of problem sizes via large-scale simulations on state-of-the-art Cray XT4, Cray XE6, and IBM BlueGene/P platforms. Results show that our unique tuning approach improves performance and energy requirements by up to 3.4x using 49,152 cores, while providing a portable optimization methodology for a variety of numerical methods on forthcoming HPC systems. |
| Author | Carter, Jonathan Shalf, John Williams, Samuel Oliker, Leonid |
| Author_xml | – sequence: 1 givenname: Samuel surname: Williams fullname: Williams, Samuel email: SWWilliams@lbl.gov organization: Lawrence Berkeley National Laboratory – sequence: 2 givenname: Leonid surname: Oliker fullname: Oliker, Leonid email: LOliker@lbl.gov organization: Lawrence Berkeley National Laboratory – sequence: 3 givenname: Jonathan surname: Carter fullname: Carter, Jonathan email: JTCarter@lbl.gov organization: Lawrence Berkeley National Laboratory – sequence: 4 givenname: John surname: Shalf fullname: Shalf, John email: JShalf@lbl.gov organization: Lawrence Berkeley National Laboratory |
| BookMark | eNqNUD1PwzAUNKJI0NKZgcUjS4q_YicjVOVDqsQCEpv1nLxQQ5tUjoOAX4-rdmBkunt39264MRm1XYuEXHA241zl14JpKQs126HKiyMyTiqTzBj-evz3GJEzwbXJlBTlKZn2_TtjjHOR51yfEVh8xQBV9O0bHdaJZn0Fa6RLiNFXSG-7dfzZQNvSLYamC4km9dMDXXkMEKqVT3kKbU1r38fg3RCxpjDELotDm2rPyUkD6x6nB5yQl7vF8_whWz7dP85vlhlwLWKGGrDWzpWlUUqVhasFNjo3dc4rUGBE6Yo8BaSEUplGOFMJLI1o0lupGMoJudz3ekS02-A3EL6tTmMpUST3au9CtbGu6z56y5ndLWkPS9rDkik6-2fUuuCxkb-ionND |
| ContentType | Conference Proceeding |
| Copyright | 2011 ACM |
| Copyright_xml | – notice: 2011 ACM |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/2063384.2063458 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 145030771X 9781450307710 |
| EndPage | 12 |
| ExternalDocumentID | 6114428 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR ACM ADPZR ALMA_UNASSIGNED_HOLDINGS APO BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK GUFHI IEGSK IERZE OCL RIB RIC RIE RIL 6IH AAWTH ABLEC ADZIZ CHZPO IPLJI |
| ID | FETCH-LOGICAL-a162t-e6aed6bb99744498bd2ef657d51ca4a729b85d6b33a947f2b7c2e972faed940e3 |
| IEDL.DBID | RIE |
| ISBN | 145030771X 9781450307710 |
| ISSN | 2167-4329 |
| IngestDate | Wed Aug 27 03:18:43 EDT 2025 Wed Jan 31 06:47:53 EST 2024 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Keywords | hybrid programming models BlueGene Lattice Boltzmann SIMD OpenMP auto-tuning |
| Language | English |
| License | Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org |
| LinkModel | DirectLink |
| MeetingName | SC '11: International Conference for High Performance Computing, Networking, Storage and Analysis |
| MergedId | FETCHMERGED-LOGICAL-a162t-e6aed6bb99744498bd2ef657d51ca4a729b85d6b33a947f2b7c2e972faed940e3 |
| PageCount | 12 |
| ParticipantIDs | acm_books_10_1145_2063384_2063458 ieee_primary_6114428 acm_books_10_1145_2063384_2063458_brief |
| PublicationCentury | 2000 |
| PublicationDate | 20111112 2011-Nov. |
| PublicationDateYYYYMMDD | 2011-11-12 2011-11-01 |
| PublicationDate_xml | – month: 11 year: 2011 text: 20111112 day: 12 |
| PublicationDecade | 2010 |
| PublicationPlace | New York, NY, USA |
| PublicationPlace_xml | – name: New York, NY, USA |
| PublicationSeriesTitle | ACM Conferences |
| PublicationTitle | 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) |
| PublicationTitleAbbrev | SC |
| PublicationYear | 2011 |
| Publisher | ACM IEEE |
| Publisher_xml | – name: ACM – name: IEEE |
| SSID | ssj0001125516 ssj0003204180 |
| Score | 1.611737 |
| Snippet | We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock... |
| SourceID | ieee acm |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Auto-tuning BlueGene Distribution functions Hybrid Programming Models Lattice Boltzmann Lattices Mathematics of computing -- Mathematical analysis -- Numerical analysis Mathematics of computing -- Mathematical analysis -- Numerical analysis -- Numerical differentiation Mathematics of computing -- Mathematical software Multicore processing OpenMP Optimization SIMD Theory of computation -- Design and analysis of algorithms Three dimensional displays Tuning Vectors |
| Title | Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning |
| URI | https://ieeexplore.ieee.org/document/6114428 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB5UPHjy0Yr1xQqCF6PNPrtXRfEg0oNKb2EfEyloKm0i4q93dxstgiDeJmECYWaTmfl2vxmAY0eVGWibZ5pJk3HPgiSNyCRTpVKqr30Ccx5v1d3dYDTSwyU4_ebCIGI6fIZnUUx7-X7imgiVncuQvId0eRmWlZJzrtYCTwmRWrSpTrxmtM_zNDiNptbejOq2s0_ORSj5ZajNIqYiGRepzap7-TFgJcWX6_X_vdkGdBdEPTL8DkGbsITVFqx_TWog7YfbAXP1Xic6VPVEmucgZrPgGyS3po6H38jF5Ln-eDFVRV4XRALyNjYkjspOmw1Bn5jKEx877cYhWeiJaepJVjcRWunCw_XV_eVN1g5XyEwuaZ2hNOiltToUFJzrgfUUSymUF7kz3ISc2w5EUGDMaK5KapWjqBUtw2Oa95Ftw0o1qXAHSCg5nOc8_FaZ4AaF1lj2OTOOKeutEz04CtYtYtUwK-ZEaFG0HihaD_Tg5E-dwk7HWPagE-1fvM67cRSt6Xd_v70HawkFTuzBfVippw0ewKp7q8ez6WFaQp8OZMA- |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS9xAEB_UFvRJW5We_dpCoS9NTfbz9rVFsfR6-GCLb2E_JuVAc3KXiPjXd3Yv9SgUpE-ZhAmEmSQz89v9zQC8D9y4sfVVYYV2hYyCJO1UoYVpjDGljRnM-Tkx0-n48tKeb8DHBy4MIubNZ_gpiXktP85Dn6CyY03JO6XLm_BE0bFcsbXWiArFajUkO-lc8FJWeXQaz829BbdDb59KKir6NVVnCVXRQqrcaDVc_zViJUeY093_e7Y9OFhT9dj5QxB6BhvYPofdP7Ma2PDp7oM7uesyIar9xforEosleQfZxHVp-xv7PL_q7q9d27KbNZWA3c4cS8Oy83ID6TPXRhZTr900Jgsjc303L7o-gSsH8OP05OLLWTGMVyhcpXlXoHYYtfeWSgop7dhHjo1WJqoqOOko6_ZjRQpCOCtNw70JHK3hDd1mZYniELbaeYsvgFHREaKU9GMVSjpU1mJTSuGCMD76oEbwjqxbp7phWa-o0KoePFAPHhjBh0d1ar-YYTOC_WT_-mbVj6MeTH_078tvYfvs4vuknnydfnsJOxkTzlzCV7DVLXp8DU_DbTdbLt7k1-k3PWXDhQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+2011+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Extracting+ultra-scale+Lattice+Boltzmann+performance+via+hierarchical+and+distributed+auto-tuning&rft.au=Williams%2C+Samuel&rft.au=Oliker%2C+Leonid&rft.au=Carter%2C+Jonathan&rft.au=Shalf%2C+John&rft.series=ACM+Conferences&rft.date=2011-11-12&rft.pub=ACM&rft.isbn=145030771X&rft.spage=1&rft.epage=12&rft_id=info:doi/10.1145%2F2063384.2063458 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-4329&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-4329&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-4329&client=summon |

