Lattice QCD with domain decomposition on Intel® Xeon Phi™ co-processors

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis pp. 69 - 80
Main Authors: Heybrock, Simon, Joó, Bálint, Kalamkar, Dhiraj D., Smelyanskiy, Mikhail, Vaidyanathan, Karthikeyan, Wettig, Tilo, Dubey, Pradeep
Format: Conference Proceeding
Language:English
Published: Piscataway, NJ, USA IEEE Press 16.11.2014
IEEE
Series:ACM Conferences
Subjects:
ISBN:1479955000, 9781479955008
ISSN:2167-4329
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel® Xeon Phi™ co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.
AbstractList The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromo dynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel ® Xeon Phi co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.
The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel® Xeon Phi™ co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.
Author Kalamkar, Dhiraj D.
Dubey, Pradeep
Wettig, Tilo
Heybrock, Simon
Joó, Bálint
Smelyanskiy, Mikhail
Vaidyanathan, Karthikeyan
Author_xml – sequence: 1
  givenname: Simon
  surname: Heybrock
  fullname: Heybrock, Simon
  organization: University of Regensburg, Germany
– sequence: 2
  givenname: Bálint
  surname: Joó
  fullname: Joó, Bálint
  organization: Thomas Jefferson National Accelerator Facility, Newport News, VA
– sequence: 3
  givenname: Dhiraj D.
  surname: Kalamkar
  fullname: Kalamkar, Dhiraj D.
  organization: Parallel Computing Lab, Intel Corporation, Bangalore, India
– sequence: 4
  givenname: Mikhail
  surname: Smelyanskiy
  fullname: Smelyanskiy, Mikhail
  organization: Parallel Computing Lab, Intel Corporation, Santa Clara, CA
– sequence: 5
  givenname: Karthikeyan
  surname: Vaidyanathan
  fullname: Vaidyanathan, Karthikeyan
  organization: Parallel Computing Lab, Intel Corporation, Bangalore, India
– sequence: 6
  givenname: Tilo
  surname: Wettig
  fullname: Wettig, Tilo
  organization: University of Regensburg, Germany
– sequence: 7
  givenname: Pradeep
  surname: Dubey
  fullname: Dubey, Pradeep
  organization: Parallel Computing Lab, Intel Corporation, Santa Clara, CA
BookMark eNqFkM1Kw0AUhUesYFu7cesm4MpF6vwlmVlK_KsUVKrgbpjM3NDRJlMyAXHvk_gAPoSP4pM4pboWLly-ew4XzhmhQetbQOiQ4CkhWJ4uyinFhEfYQSPCCykzLiXb_YMMYzxAQ0ryIuWMyn00CeE5HgnPckKzIbqZ6753BpL78jx5df0ysb7Rrk0sGN-sfXC9820SZ9b2sPr6TJ4gwt3Sfb9_JMan684bCMF34QDt1XoVYPK7x-jx8uKhvE7nt1ez8myeasZ4n0JuRG2JEEwaQQqrK2p5RXGWWY6JpZprwziGrJAMJAgiqBZ1YUDjCteYszE62v51AKDWnWt096YKTGiMHtWTrapNoyrvX4IiWG3aUotSbdqKoKrOQR29x_972Q8L82fb
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/SC.2014.11
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1479954993
9781479955008
9781479954995
1479955000
EndPage 80
ExternalDocumentID 7012993
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
IERZE
OCL
RIE
RIL
6IH
AAWTH
ABLEC
ADZIZ
CHZPO
IPLJI
ID FETCH-LOGICAL-a334t-e6c8fd18839c817dab2d4b2055d401d2a4ac340e5793e9e8182a8f7cea0b0f043
IEDL.DBID RIE
ISBN 1479955000
9781479955008
ISICitedReferencesCount 22
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000393484400006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2167-4329
IngestDate Wed Aug 27 01:54:18 EDT 2025
Wed Jan 31 06:40:24 EST 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Keywords Xeon Phi
domain decomposition
Intel
coprocessor
lattice QCD
Language English
LinkModel DirectLink
MeetingName SC '14: International Conference for High Performance Computing, Networking, Storage and Analysis
MergedId FETCHMERGED-LOGICAL-a334t-e6c8fd18839c817dab2d4b2055d401d2a4ac340e5793e9e8182a8f7cea0b0f043
OpenAccessLink https://www.osti.gov/servlets/purl/1169292
PageCount 12
ParticipantIDs acm_books_10_1109_SC_2014_11_brief
ieee_primary_7012993
acm_books_10_1109_SC_2014_11
PublicationCentury 2000
PublicationDate 20141116
2014-Nov.
PublicationDateYYYYMMDD 2014-11-16
2014-11-01
PublicationDate_xml – month: 11
  year: 2014
  text: 20141116
  day: 16
PublicationDecade 2010
PublicationPlace Piscataway, NJ, USA
PublicationPlace_xml – name: Piscataway, NJ, USA
PublicationSeriesTitle ACM Conferences
PublicationTitle Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
PublicationTitleAbbrev SC
PublicationYear 2014
Publisher IEEE Press
IEEE
Publisher_xml – name: IEEE Press
– name: IEEE
SSID ssj0001456125
ssj0001947932
ssj0003204180
Score 1.7410434
Snippet The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale...
SourceID ieee
acm
SourceType Publisher
StartPage 69
SubjectTerms and very la
Applied computing -- Physical sciences and engineering -- Physics
Computing methodologies -- Symbolic and algebraic manipulation -- Symbolic and algebraic algorithms -- Linear algebra algorithms
Domain decomposition
G.1.3 [Numerical Analysis]: Numerical Linear Algebra Sparse
Gold
Intel® Xeon Phi coprocessor
Jacobian matrices
Lattice QCD Categories and subject descriptors: D.3.4 [Programming Languages]: Processors Optimization
Lattices
Layout
Linear systems
Mathematics of computing -- Mathematical analysis -- Numerical analysis -- Computations on matrices
Mathematics of computing -- Mathematical software
Prefetching
Software and its engineering -- Software notations and tools -- Compilers
structured
Vectors
Title Lattice QCD with domain decomposition on Intel® Xeon Phi™ co-processors
URI https://ieeexplore.ieee.org/document/7012993
WOSCitedRecordID wos000393484400006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LSsNAFL1YceGqPirWF4O4NHUyM-lk1qmPRSkVH3QXJpMbzKKNtKmf5Uf4Zc5MY0UQQcgiNyEQDhPu3JN77gG40JmShsssyKTIAsF0FCibFQJthJIRhqH2urXnoRyN4slEjTfgcq2FQUTffIY9d-r_5eeVWTqq7Eo61kTxFrSklCut1jef4nYCzdr0sXKcEVvHnFEReiM15kd9c6aaaaUhVVcPievyEj1nJNTSZvrDa8Wnmpv2_15yBzrfmj0yXmejXdjA2R60v0wbSPMN78PtUNeu4Y3cJwPiWFgyqKa6nJEBuvbypoeL2MMrTD7eyQRtMH4pSVIFja6gmi868HRz_ZjcBY2bQqA5F3WAfRMXeRjbHZGJQ5nrjOUiYzSKcltj5UwLbbigGFnYUKFN5EzHhTSoaUYLKvgBbM6qGR4CkVKxos-xHxmb3DTGLM9VqIrQBjEt-l04sRimrkxYpL7KoCp9SFIHsQ26cP7X7TSbl1h0Yd9hm76uhm6kDaxHv18-hm339EokeAKb9XyJp7Bl3upyMT_zK-UTBNW2Dw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3bSsNAEB28gT55x3pdxEeje0s3-5x6w1oqXuhb2Gwm2Acbaauf5Uf4Ze5uY0UQQchDJoEQDhtm52TOHIAjk2tlhcqjXMk8ktzEkXZZITJWahUjYybo1h7bqtNJej3dnYHjqRYGEUPzGZ740_Avv6jsq6fKTpVnTbSYhflYSs4maq1vRsXvBerVGWLtWSM-jQWnkgUrNR6GfQuu63mljOrTu9T3eckTbyU0a-zzD7eVkGzOl__3miuw8a3aI91pPlqFGRyswfKXbQOpv-J1uGibsW95I7dpi3gelrSqZ9MfkBb6BvO6i4u4I2hMPt5JD13QfeqTtIpqZUE1HG3Aw_nZfXoZ1X4KkRFCjiNs2qQsWOL2RDZhqjA5L2TOaRwXrsoquJHGCkkxdrChRpfKuUlKZdHQnJZUik2YG1QD3AKilOZlU2Azti69GUx4UWimS-aChJbNBuw6DDNfKIyyUGdQnd2lmYfYBQ04_Ot2lg_7WDZg3WObvUzGbmQ1rNu_Xz6Axcv7m3bWvupc78CSf9JEMrgLc-PhK-7Bgn0b90fD_bBqPgGxSrlW
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Lattice+QCD+with+domain+decomposition+on+Intel%C2%AE+Xeon+Phi%E2%84%A2+co-processors&rft.au=Heybrock%2C+Simon&rft.au=Jo%C3%B3%2C+B%C3%A1lint&rft.au=Kalamkar%2C+Dhiraj+D.&rft.au=Smelyanskiy%2C+Mikhail&rft.series=ACM+Conferences&rft.date=2014-11-16&rft.pub=IEEE+Press&rft.isbn=1479955000&rft.spage=69&rft.epage=80&rft_id=info:doi/10.1109%2FSC.2014.11
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-4329&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-4329&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-4329&client=summon