Lattice QCD with domain decomposition on Intel® Xeon Phi™ co-processors

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis pp. 69 - 80
Main Authors:	Heybrock, Simon, Joó, Bálint, Kalamkar, Dhiraj D., Smelyanskiy, Mikhail, Vaidyanathan, Karthikeyan, Wettig, Tilo, Dubey, Pradeep
Format:	Conference Proceeding
Language:	English
Published:	Piscataway, NJ, USA IEEE Press 16.11.2014 IEEE
Series:	ACM Conferences
Subjects:	and very la Applied computing > Physical sciences and engineering > Physics Computing methodologies > Symbolic and algebraic manipulation > Symbolic and algebraic algorithms > Linear algebra algorithms Domain decomposition G.1.3 [Numerical Analysis]: Numerical Linear Algebra Sparse Gold Intel® Xeon Phi coprocessor Jacobian matrices Lattice QCD Categories and subject descriptors: D.3.4 [Programming Languages]: Processors Optimization Lattices Layout Linear systems Mathematics of computing > Mathematical analysis > Numerical analysis > Computations on matrices Mathematics of computing > Mathematical software Prefetching Software and its engineering > Software notations and tools > Compilers structured Vectors Xeon Phi domain decomposition Intel coprocessor lattice QCD
ISBN:	1479955000, 9781479955008
ISSN:	2167-4329
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel® Xeon Phi™ co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.
AbstractList	The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromo dynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel ® Xeon Phi co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5. The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel® Xeon Phi™ co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.
Author	Kalamkar, Dhiraj D. Dubey, Pradeep Wettig, Tilo Heybrock, Simon Joó, Bálint Smelyanskiy, Mikhail Vaidyanathan, Karthikeyan
Author_xml	– sequence: 1 givenname: Simon surname: Heybrock fullname: Heybrock, Simon organization: University of Regensburg, Germany – sequence: 2 givenname: Bálint surname: Joó fullname: Joó, Bálint organization: Thomas Jefferson National Accelerator Facility, Newport News, VA – sequence: 3 givenname: Dhiraj D. surname: Kalamkar fullname: Kalamkar, Dhiraj D. organization: Parallel Computing Lab, Intel Corporation, Bangalore, India – sequence: 4 givenname: Mikhail surname: Smelyanskiy fullname: Smelyanskiy, Mikhail organization: Parallel Computing Lab, Intel Corporation, Santa Clara, CA – sequence: 5 givenname: Karthikeyan surname: Vaidyanathan fullname: Vaidyanathan, Karthikeyan organization: Parallel Computing Lab, Intel Corporation, Bangalore, India – sequence: 6 givenname: Tilo surname: Wettig fullname: Wettig, Tilo organization: University of Regensburg, Germany – sequence: 7 givenname: Pradeep surname: Dubey fullname: Dubey, Pradeep organization: Parallel Computing Lab, Intel Corporation, Santa Clara, CA
BookMark	eNqFkM1Kw0AUhUesYFu7cesm4MpF6vwlmVlK_KsUVKrgbpjM3NDRJlMyAXHvk_gAPoSP4pM4pboWLly-ew4XzhmhQetbQOiQ4CkhWJ4uyinFhEfYQSPCCykzLiXb_YMMYzxAQ0ryIuWMyn00CeE5HgnPckKzIbqZ6753BpL78jx5df0ysb7Rrk0sGN-sfXC9820SZ9b2sPr6TJ4gwt3Sfb9_JMan684bCMF34QDt1XoVYPK7x-jx8uKhvE7nt1ez8myeasZ4n0JuRG2JEEwaQQqrK2p5RXGWWY6JpZprwziGrJAMJAgiqBZ1YUDjCteYszE62v51AKDWnWt096YKTGiMHtWTrapNoyrvX4IiWG3aUotSbdqKoKrOQR29x_972Q8L82fb
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/SC.2014.11
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	1479954993 9781479955008 9781479954995 1479955000
EndPage	80
ExternalDocumentID	7012993
Genre	orig-research
GroupedDBID	6IE 6IF 6IK 6IL 6IN AAJGR ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK IERZE OCL RIE RIL 6IH AAWTH ABLEC ADZIZ CHZPO IPLJI
ID	FETCH-LOGICAL-a334t-e6c8fd18839c817dab2d4b2055d401d2a4ac340e5793e9e8182a8f7cea0b0f043
IEDL.DBID	RIE
ISBN	1479955000 9781479955008
ISICitedReferencesCount	22
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000393484400006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	2167-4329
IngestDate	Wed Aug 27 01:54:18 EDT 2025 Wed Jan 31 06:40:24 EST 2024
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Keywords	Xeon Phi domain decomposition Intel coprocessor lattice QCD
Language	English
LinkModel	DirectLink
MeetingName	SC '14: International Conference for High Performance Computing, Networking, Storage and Analysis
MergedId	FETCHMERGED-LOGICAL-a334t-e6c8fd18839c817dab2d4b2055d401d2a4ac340e5793e9e8182a8f7cea0b0f043
OpenAccessLink	https://www.osti.gov/servlets/purl/1169292
PageCount	12
ParticipantIDs	acm_books_10_1109_SC_2014_11_brief ieee_primary_7012993 acm_books_10_1109_SC_2014_11
PublicationCentury	2000
PublicationDate	20141116 2014-Nov.
PublicationDateYYYYMMDD	2014-11-16 2014-11-01
PublicationDate_xml	– month: 11 year: 2014 text: 20141116 day: 16
PublicationDecade	2010
PublicationPlace	Piscataway, NJ, USA
PublicationPlace_xml	– name: Piscataway, NJ, USA
PublicationSeriesTitle	ACM Conferences
PublicationTitle	Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
PublicationTitleAbbrev	SC
PublicationYear	2014
Publisher	IEEE Press IEEE
Publisher_xml	– name: IEEE Press – name: IEEE
SSID	ssj0001456125 ssj0001947932 ssj0003204180
Score	1.7410434
Snippet	The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale...
SourceID	ieee acm
SourceType	Publisher
StartPage	69
SubjectTerms	and very la Applied computing -- Physical sciences and engineering -- Physics Computing methodologies -- Symbolic and algebraic manipulation -- Symbolic and algebraic algorithms -- Linear algebra algorithms Domain decomposition G.1.3 [Numerical Analysis]: Numerical Linear Algebra Sparse Gold Intel® Xeon Phi coprocessor Jacobian matrices Lattice QCD Categories and subject descriptors: D.3.4 [Programming Languages]: Processors Optimization Lattices Layout Linear systems Mathematics of computing -- Mathematical analysis -- Numerical analysis -- Computations on matrices Mathematics of computing -- Mathematical software Prefetching Software and its engineering -- Software notations and tools -- Compilers structured Vectors
Title	Lattice QCD with domain decomposition on Intel® Xeon Phi™ co-processors
URI	https://ieeexplore.ieee.org/document/7012993
WOSCitedRecordID	wos000393484400006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LSsNAFL1YceGqPirWF4O4NHUyM-lk1qmPRSkVH3QXJpMbzKKNtKmf5Uf4Zc5MY0UQQcgiNyEQDhPu3JN77gG40JmShsssyKTIAsF0FCibFQJthJIRhqH2urXnoRyN4slEjTfgcq2FQUTffIY9d-r_5eeVWTqq7Eo61kTxFrSklCut1jef4nYCzdr0sXKcEVvHnFEReiM15kd9c6aaaaUhVVcPievyEj1nJNTSZvrDa8Wnmpv2_15yBzrfmj0yXmejXdjA2R60v0wbSPMN78PtUNeu4Y3cJwPiWFgyqKa6nJEBuvbypoeL2MMrTD7eyQRtMH4pSVIFja6gmi868HRz_ZjcBY2bQqA5F3WAfRMXeRjbHZGJQ5nrjOUiYzSKcltj5UwLbbigGFnYUKFN5EzHhTSoaUYLKvgBbM6qGR4CkVKxos-xHxmb3DTGLM9VqIrQBjEt-l04sRimrkxYpL7KoCp9SFIHsQ26cP7X7TSbl1h0Yd9hm76uhm6kDaxHv18-hm339EokeAKb9XyJp7Bl3upyMT_zK-UTBNW2Dw
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3bSsNAEB28gT55x3pdxEeje0s3-5x6w1oqXuhb2Gwm2Acbaauf5Uf4Ze5uY0UQQchDJoEQDhtm52TOHIAjk2tlhcqjXMk8ktzEkXZZITJWahUjYybo1h7bqtNJej3dnYHjqRYGEUPzGZ740_Avv6jsq6fKTpVnTbSYhflYSs4maq1vRsXvBerVGWLtWSM-jQWnkgUrNR6GfQuu63mljOrTu9T3eckTbyU0a-zzD7eVkGzOl__3miuw8a3aI91pPlqFGRyswfKXbQOpv-J1uGibsW95I7dpi3gelrSqZ9MfkBb6BvO6i4u4I2hMPt5JD13QfeqTtIpqZUE1HG3Aw_nZfXoZ1X4KkRFCjiNs2qQsWOL2RDZhqjA5L2TOaRwXrsoquJHGCkkxdrChRpfKuUlKZdHQnJZUik2YG1QD3AKilOZlU2Azti69GUx4UWimS-aChJbNBuw6DDNfKIyyUGdQnd2lmYfYBQ04_Ot2lg_7WDZg3WObvUzGbmQ1rNu_Xz6Axcv7m3bWvupc78CSf9JEMrgLc-PhK-7Bgn0b90fD_bBqPgGxSrlW
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Lattice+QCD+with+domain+decomposition+on+Intel%C2%AE+Xeon+Phi%E2%84%A2+co-processors&rft.au=Heybrock%2C+Simon&rft.au=Jo%C3%B3%2C+B%C3%A1lint&rft.au=Kalamkar%2C+Dhiraj+D.&rft.au=Smelyanskiy%2C+Mikhail&rft.series=ACM+Conferences&rft.date=2014-11-16&rft.pub=IEEE+Press&rft.isbn=1479955000&rft.spage=69&rft.epage=80&rft_id=info:doi/10.1109%2FSC.2014.11
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-4329&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-4329&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-4329&client=summon