Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning

We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock speeds. In this work, we demonstrate a hierarchical approach towards effectively extracting performance for a variety of emerging multicore-b...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) s. 1 - 12
Hlavní autori:	Williams, Samuel, Oliker, Leonid, Carter, Jonathan, Shalf, John
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	New York, NY, USA ACM 12.11.2011 IEEE
Edícia:	ACM Conferences
Predmet:	Auto-tuning BlueGene Distribution functions Hybrid Programming Models Lattice Boltzmann Lattices Mathematics of computing > Mathematical analysis > Numerical analysis Mathematics of computing > Mathematical analysis > Numerical analysis > Numerical differentiation Mathematics of computing > Mathematical software Multicore processing OpenMP Optimization SIMD Theory of computation > Design and analysis of algorithms Three dimensional displays Tuning Vectors hybrid programming models BlueGene Lattice Boltzmann SIMD OpenMP auto-tuning
ISBN:	145030771X, 9781450307710
ISSN:	2167-4329
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Abstract	We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock speeds. In this work, we demonstrate a hierarchical approach towards effectively extracting performance for a variety of emerging multicore-based supercomputing platforms. Our examined application is a structured grid-based Lattice Boltzmann computation that simulates homogeneous isotropic turbulence in magnetohydrodynamics. First, we examine sophisticated sequential auto-tuning techniques including loop transformations, virtual vectorization, and use of ISA-specific intrinsics. Next, we present a variety of parallel optimization approaches including programming model exploration (flat MPI, MPI/OpenMP, and MPI/Pthreads), as well as data and thread decomposition strategies designed to mitigate communication bottlenecks. Finally, we evaluate the impact of our hierarchical tuning techniques using a variety of problem sizes via large-scale simulations on state-of-the-art Cray XT4, Cray XE6, and IBM BlueGene/P platforms. Results show that our unique tuning approach improves performance and energy requirements by up to 3.4x using 49,152 cores, while providing a portable optimization methodology for a variety of numerical methods on forthcoming HPC systems.
AbstractList	We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock speeds. In this work, we demonstrate a hierarchical approach towards effectively extracting performance for a variety of emerging multicore-based supercomputing platforms. Our examined application is a structured grid-based Lattice Boltzmann computation that simulates homogeneous isotropic turbulence in magnetohydrodynamics. First, we examine sophisticated sequential auto-tuning techniques including loop transformations, virtual vectorization, and use of ISA-specific intrinsics. Next, we present a variety of parallel optimization approaches including programming model exploration (flat MPI, MPI/OpenMP, and MPI/Pthreads), as well as data and thread decomposition strategies designed to mitigate communication bottlenecks. Finally, we evaluate the impact of our hierarchical tuning techniques using a variety of problem sizes via large-scale simulations on state-of-the-art Cray XT4, Cray XE6, and IBM BlueGene/P platforms. Results show that our unique tuning approach improves performance and energy requirements by up to 3.4x using 49,152 cores, while providing a portable optimization methodology for a variety of numerical methods on forthcoming HPC systems.
Author	Carter, Jonathan Shalf, John Williams, Samuel Oliker, Leonid
Author_xml	– sequence: 1 givenname: Samuel surname: Williams fullname: Williams, Samuel email: SWWilliams@lbl.gov organization: Lawrence Berkeley National Laboratory – sequence: 2 givenname: Leonid surname: Oliker fullname: Oliker, Leonid email: LOliker@lbl.gov organization: Lawrence Berkeley National Laboratory – sequence: 3 givenname: Jonathan surname: Carter fullname: Carter, Jonathan email: JTCarter@lbl.gov organization: Lawrence Berkeley National Laboratory – sequence: 4 givenname: John surname: Shalf fullname: Shalf, John email: JShalf@lbl.gov organization: Lawrence Berkeley National Laboratory
BookMark	eNqNUD1PwzAUNKJI0NKZgcUjS4q_YicjVOVDqsQCEpv1nLxQQ5tUjoOAX4-rdmBkunt39264MRm1XYuEXHA241zl14JpKQs126HKiyMyTiqTzBj-evz3GJEzwbXJlBTlKZn2_TtjjHOR51yfEVh8xQBV9O0bHdaJZn0Fa6RLiNFXSG-7dfzZQNvSLYamC4km9dMDXXkMEKqVT3kKbU1r38fg3RCxpjDELotDm2rPyUkD6x6nB5yQl7vF8_whWz7dP85vlhlwLWKGGrDWzpWlUUqVhasFNjo3dc4rUGBE6Yo8BaSEUplGOFMJLI1o0lupGMoJudz3ekS02-A3EL6tTmMpUST3au9CtbGu6z56y5ndLWkPS9rDkik6-2fUuuCxkb-ionND
ContentType	Conference Proceeding
Copyright	2011 ACM
Copyright_xml	– notice: 2011 ACM
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1145/2063384.2063458
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	145030771X 9781450307710
EndPage	12
ExternalDocumentID	6114428
Genre	orig-research
GroupedDBID	6IE 6IF 6IK 6IL 6IN AAJGR ACM ADPZR ALMA_UNASSIGNED_HOLDINGS APO BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK GUFHI IEGSK IERZE OCL RIB RIC RIE RIL 6IH AAWTH ABLEC ADZIZ CHZPO IPLJI
ID	FETCH-LOGICAL-a162t-e6aed6bb99744498bd2ef657d51ca4a729b85d6b33a947f2b7c2e972faed940e3
IEDL.DBID	RIE
ISBN	145030771X 9781450307710
ISSN	2167-4329
IngestDate	Wed Aug 27 03:18:43 EDT 2025 Wed Jan 31 06:47:53 EST 2024
IsPeerReviewed	false
IsScholarly	false
Keywords	hybrid programming models BlueGene Lattice Boltzmann SIMD OpenMP auto-tuning
Language	English
License	Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org
LinkModel	DirectLink
MeetingName	SC '11: International Conference for High Performance Computing, Networking, Storage and Analysis
MergedId	FETCHMERGED-LOGICAL-a162t-e6aed6bb99744498bd2ef657d51ca4a729b85d6b33a947f2b7c2e972faed940e3
PageCount	12
ParticipantIDs	acm_books_10_1145_2063384_2063458 ieee_primary_6114428 acm_books_10_1145_2063384_2063458_brief
PublicationCentury	2000
PublicationDate	20111112 2011-Nov.
PublicationDateYYYYMMDD	2011-11-12 2011-11-01
PublicationDate_xml	– month: 11 year: 2011 text: 20111112 day: 12
PublicationDecade	2010
PublicationPlace	New York, NY, USA
PublicationPlace_xml	– name: New York, NY, USA
PublicationSeriesTitle	ACM Conferences
PublicationTitle	2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
PublicationTitleAbbrev	SC
PublicationYear	2011
Publisher	ACM IEEE
Publisher_xml	– name: ACM – name: IEEE
SSID	ssj0001125516 ssj0003204180
Score	1.611737
Snippet	We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock...
SourceID	ieee acm
SourceType	Publisher
StartPage	1
SubjectTerms	Auto-tuning BlueGene Distribution functions Hybrid Programming Models Lattice Boltzmann Lattices Mathematics of computing -- Mathematical analysis -- Numerical analysis Mathematics of computing -- Mathematical analysis -- Numerical analysis -- Numerical differentiation Mathematics of computing -- Mathematical software Multicore processing OpenMP Optimization SIMD Theory of computation -- Design and analysis of algorithms Three dimensional displays Tuning Vectors
Title	Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning
URI	https://ieeexplore.ieee.org/document/6114428
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB5UPHjy0Yr1xQqCF6PNPrtXRfEg0oNKb2EfEyloKm0i4q93dxstgiDeJmECYWaTmfl2vxmAY0eVGWibZ5pJk3HPgiSNyCRTpVKqr30Ccx5v1d3dYDTSwyU4_ebCIGI6fIZnUUx7-X7imgiVncuQvId0eRmWlZJzrtYCTwmRWrSpTrxmtM_zNDiNptbejOq2s0_ORSj5ZajNIqYiGRepzap7-TFgJcWX6_X_vdkGdBdEPTL8DkGbsITVFqx_TWog7YfbAXP1Xic6VPVEmucgZrPgGyS3po6H38jF5Ln-eDFVRV4XRALyNjYkjspOmw1Bn5jKEx877cYhWeiJaepJVjcRWunCw_XV_eVN1g5XyEwuaZ2hNOiltToUFJzrgfUUSymUF7kz3ISc2w5EUGDMaK5KapWjqBUtw2Oa95Ftw0o1qXAHSCg5nOc8_FaZ4AaF1lj2OTOOKeutEz04CtYtYtUwK-ZEaFG0HihaD_Tg5E-dwk7HWPagE-1fvM67cRSt6Xd_v70HawkFTuzBfVippw0ewKp7q8ez6WFaQp8OZMA-
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS9xAEB_UFvRJW5We_dpCoS9NTfbz9rVFsfR6-GCLb2E_JuVAc3KXiPjXd3Yv9SgUpE-ZhAmEmSQz89v9zQC8D9y4sfVVYYV2hYyCJO1UoYVpjDGljRnM-Tkx0-n48tKeb8DHBy4MIubNZ_gpiXktP85Dn6CyY03JO6XLm_BE0bFcsbXWiArFajUkO-lc8FJWeXQaz829BbdDb59KKir6NVVnCVXRQqrcaDVc_zViJUeY093_e7Y9OFhT9dj5QxB6BhvYPofdP7Ma2PDp7oM7uesyIar9xforEosleQfZxHVp-xv7PL_q7q9d27KbNZWA3c4cS8Oy83ID6TPXRhZTr900Jgsjc303L7o-gSsH8OP05OLLWTGMVyhcpXlXoHYYtfeWSgop7dhHjo1WJqoqOOko6_ZjRQpCOCtNw70JHK3hDd1mZYniELbaeYsvgFHREaKU9GMVSjpU1mJTSuGCMD76oEbwjqxbp7phWa-o0KoePFAPHhjBh0d1ar-YYTOC_WT_-mbVj6MeTH_078tvYfvs4vuknnydfnsJOxkTzlzCV7DVLXp8DU_DbTdbLt7k1-k3PWXDhQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+2011+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Extracting+ultra-scale+Lattice+Boltzmann+performance+via+hierarchical+and+distributed+auto-tuning&rft.au=Williams%2C+Samuel&rft.au=Oliker%2C+Leonid&rft.au=Carter%2C+Jonathan&rft.au=Shalf%2C+John&rft.series=ACM+Conferences&rft.date=2011-11-12&rft.pub=ACM&rft.isbn=145030771X&rft.spage=1&rft.epage=12&rft_id=info:doi/10.1145%2F2063384.2063458
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-4329&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-4329&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-4329&client=summon