Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning

We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock speeds. In this work, we demonstrate a hierarchical approach towards effectively extracting performance for a variety of emerging multicore-b...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) s. 1 - 12
Hlavní autori: Williams, Samuel, Oliker, Leonid, Carter, Jonathan, Shalf, John
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: New York, NY, USA ACM 12.11.2011
IEEE
Edícia:ACM Conferences
Predmet:
ISBN:145030771X, 9781450307710
ISSN:2167-4329
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock speeds. In this work, we demonstrate a hierarchical approach towards effectively extracting performance for a variety of emerging multicore-based supercomputing platforms. Our examined application is a structured grid-based Lattice Boltzmann computation that simulates homogeneous isotropic turbulence in magnetohydrodynamics. First, we examine sophisticated sequential auto-tuning techniques including loop transformations, virtual vectorization, and use of ISA-specific intrinsics. Next, we present a variety of parallel optimization approaches including programming model exploration (flat MPI, MPI/OpenMP, and MPI/Pthreads), as well as data and thread decomposition strategies designed to mitigate communication bottlenecks. Finally, we evaluate the impact of our hierarchical tuning techniques using a variety of problem sizes via large-scale simulations on state-of-the-art Cray XT4, Cray XE6, and IBM BlueGene/P platforms. Results show that our unique tuning approach improves performance and energy requirements by up to 3.4x using 49,152 cores, while providing a portable optimization methodology for a variety of numerical methods on forthcoming HPC systems.
AbstractList We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock speeds. In this work, we demonstrate a hierarchical approach towards effectively extracting performance for a variety of emerging multicore-based supercomputing platforms. Our examined application is a structured grid-based Lattice Boltzmann computation that simulates homogeneous isotropic turbulence in magnetohydrodynamics. First, we examine sophisticated sequential auto-tuning techniques including loop transformations, virtual vectorization, and use of ISA-specific intrinsics. Next, we present a variety of parallel optimization approaches including programming model exploration (flat MPI, MPI/OpenMP, and MPI/Pthreads), as well as data and thread decomposition strategies designed to mitigate communication bottlenecks. Finally, we evaluate the impact of our hierarchical tuning techniques using a variety of problem sizes via large-scale simulations on state-of-the-art Cray XT4, Cray XE6, and IBM BlueGene/P platforms. Results show that our unique tuning approach improves performance and energy requirements by up to 3.4x using 49,152 cores, while providing a portable optimization methodology for a variety of numerical methods on forthcoming HPC systems.
Author Carter, Jonathan
Shalf, John
Williams, Samuel
Oliker, Leonid
Author_xml – sequence: 1
  givenname: Samuel
  surname: Williams
  fullname: Williams, Samuel
  email: SWWilliams@lbl.gov
  organization: Lawrence Berkeley National Laboratory
– sequence: 2
  givenname: Leonid
  surname: Oliker
  fullname: Oliker, Leonid
  email: LOliker@lbl.gov
  organization: Lawrence Berkeley National Laboratory
– sequence: 3
  givenname: Jonathan
  surname: Carter
  fullname: Carter, Jonathan
  email: JTCarter@lbl.gov
  organization: Lawrence Berkeley National Laboratory
– sequence: 4
  givenname: John
  surname: Shalf
  fullname: Shalf, John
  email: JShalf@lbl.gov
  organization: Lawrence Berkeley National Laboratory
BookMark eNqNUD1PwzAUNKJI0NKZgcUjS4q_YicjVOVDqsQCEpv1nLxQQ5tUjoOAX4-rdmBkunt39264MRm1XYuEXHA241zl14JpKQs126HKiyMyTiqTzBj-evz3GJEzwbXJlBTlKZn2_TtjjHOR51yfEVh8xQBV9O0bHdaJZn0Fa6RLiNFXSG-7dfzZQNvSLYamC4km9dMDXXkMEKqVT3kKbU1r38fg3RCxpjDELotDm2rPyUkD6x6nB5yQl7vF8_whWz7dP85vlhlwLWKGGrDWzpWlUUqVhasFNjo3dc4rUGBE6Yo8BaSEUplGOFMJLI1o0lupGMoJudz3ekS02-A3EL6tTmMpUST3au9CtbGu6z56y5ndLWkPS9rDkik6-2fUuuCxkb-ionND
ContentType Conference Proceeding
Copyright 2011 ACM
Copyright_xml – notice: 2011 ACM
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/2063384.2063458
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 145030771X
9781450307710
EndPage 12
ExternalDocumentID 6114428
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
ACM
ADPZR
ALMA_UNASSIGNED_HOLDINGS
APO
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
GUFHI
IEGSK
IERZE
OCL
RIB
RIC
RIE
RIL
6IH
AAWTH
ABLEC
ADZIZ
CHZPO
IPLJI
ID FETCH-LOGICAL-a162t-e6aed6bb99744498bd2ef657d51ca4a729b85d6b33a947f2b7c2e972faed940e3
IEDL.DBID RIE
ISBN 145030771X
9781450307710
ISSN 2167-4329
IngestDate Wed Aug 27 03:18:43 EDT 2025
Wed Jan 31 06:47:53 EST 2024
IsPeerReviewed false
IsScholarly false
Keywords hybrid programming models
BlueGene
Lattice Boltzmann
SIMD
OpenMP
auto-tuning
Language English
License Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org
LinkModel DirectLink
MeetingName SC '11: International Conference for High Performance Computing, Networking, Storage and Analysis
MergedId FETCHMERGED-LOGICAL-a162t-e6aed6bb99744498bd2ef657d51ca4a729b85d6b33a947f2b7c2e972faed940e3
PageCount 12
ParticipantIDs acm_books_10_1145_2063384_2063458
ieee_primary_6114428
acm_books_10_1145_2063384_2063458_brief
PublicationCentury 2000
PublicationDate 20111112
2011-Nov.
PublicationDateYYYYMMDD 2011-11-12
2011-11-01
PublicationDate_xml – month: 11
  year: 2011
  text: 20111112
  day: 12
PublicationDecade 2010
PublicationPlace New York, NY, USA
PublicationPlace_xml – name: New York, NY, USA
PublicationSeriesTitle ACM Conferences
PublicationTitle 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
PublicationTitleAbbrev SC
PublicationYear 2011
Publisher ACM
IEEE
Publisher_xml – name: ACM
– name: IEEE
SSID ssj0001125516
ssj0003204180
Score 1.611737
Snippet We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock...
SourceID ieee
acm
SourceType Publisher
StartPage 1
SubjectTerms Auto-tuning
BlueGene
Distribution functions
Hybrid Programming Models
Lattice Boltzmann
Lattices
Mathematics of computing -- Mathematical analysis -- Numerical analysis
Mathematics of computing -- Mathematical analysis -- Numerical analysis -- Numerical differentiation
Mathematics of computing -- Mathematical software
Multicore processing
OpenMP
Optimization
SIMD
Theory of computation -- Design and analysis of algorithms
Three dimensional displays
Tuning
Vectors
Title Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning
URI https://ieeexplore.ieee.org/document/6114428
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB5UPHjy0Yr1xQqCF6PNPrtXRfEg0oNKb2EfEyloKm0i4q93dxstgiDeJmECYWaTmfl2vxmAY0eVGWibZ5pJk3HPgiSNyCRTpVKqr30Ccx5v1d3dYDTSwyU4_ebCIGI6fIZnUUx7-X7imgiVncuQvId0eRmWlZJzrtYCTwmRWrSpTrxmtM_zNDiNptbejOq2s0_ORSj5ZajNIqYiGRepzap7-TFgJcWX6_X_vdkGdBdEPTL8DkGbsITVFqx_TWog7YfbAXP1Xic6VPVEmucgZrPgGyS3po6H38jF5Ln-eDFVRV4XRALyNjYkjspOmw1Bn5jKEx877cYhWeiJaepJVjcRWunCw_XV_eVN1g5XyEwuaZ2hNOiltToUFJzrgfUUSymUF7kz3ISc2w5EUGDMaK5KapWjqBUtw2Oa95Ftw0o1qXAHSCg5nOc8_FaZ4AaF1lj2OTOOKeutEz04CtYtYtUwK-ZEaFG0HihaD_Tg5E-dwk7HWPagE-1fvM67cRSt6Xd_v70HawkFTuzBfVippw0ewKp7q8ez6WFaQp8OZMA-
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS9xAEB_UFvRJW5We_dpCoS9NTfbz9rVFsfR6-GCLb2E_JuVAc3KXiPjXd3Yv9SgUpE-ZhAmEmSQz89v9zQC8D9y4sfVVYYV2hYyCJO1UoYVpjDGljRnM-Tkx0-n48tKeb8DHBy4MIubNZ_gpiXktP85Dn6CyY03JO6XLm_BE0bFcsbXWiArFajUkO-lc8FJWeXQaz829BbdDb59KKir6NVVnCVXRQqrcaDVc_zViJUeY093_e7Y9OFhT9dj5QxB6BhvYPofdP7Ma2PDp7oM7uesyIar9xforEosleQfZxHVp-xv7PL_q7q9d27KbNZWA3c4cS8Oy83ID6TPXRhZTr900Jgsjc303L7o-gSsH8OP05OLLWTGMVyhcpXlXoHYYtfeWSgop7dhHjo1WJqoqOOko6_ZjRQpCOCtNw70JHK3hDd1mZYniELbaeYsvgFHREaKU9GMVSjpU1mJTSuGCMD76oEbwjqxbp7phWa-o0KoePFAPHhjBh0d1ar-YYTOC_WT_-mbVj6MeTH_078tvYfvs4vuknnydfnsJOxkTzlzCV7DVLXp8DU_DbTdbLt7k1-k3PWXDhQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+2011+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Extracting+ultra-scale+Lattice+Boltzmann+performance+via+hierarchical+and+distributed+auto-tuning&rft.au=Williams%2C+Samuel&rft.au=Oliker%2C+Leonid&rft.au=Carter%2C+Jonathan&rft.au=Shalf%2C+John&rft.series=ACM+Conferences&rft.date=2011-11-12&rft.pub=ACM&rft.isbn=145030771X&rft.spage=1&rft.epage=12&rft_id=info:doi/10.1145%2F2063384.2063458
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-4329&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-4329&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-4329&client=summon