Lattice QCD with domain decomposition on Intel® Xeon Phi™ co-processors
The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context...
Saved in:
| Published in: | Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis pp. 69 - 80 |
|---|---|
| Main Authors: | , , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
Piscataway, NJ, USA
IEEE Press
16.11.2014
IEEE |
| Series: | ACM Conferences |
| Subjects: | |
| ISBN: | 1479955000, 9781479955008 |
| ISSN: | 2167-4329 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel® Xeon Phi™ co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5. |
|---|---|
| AbstractList | The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromo dynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel ® Xeon Phi co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5. The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel® Xeon Phi™ co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5. |
| Author | Kalamkar, Dhiraj D. Dubey, Pradeep Wettig, Tilo Heybrock, Simon Joó, Bálint Smelyanskiy, Mikhail Vaidyanathan, Karthikeyan |
| Author_xml | – sequence: 1 givenname: Simon surname: Heybrock fullname: Heybrock, Simon organization: University of Regensburg, Germany – sequence: 2 givenname: Bálint surname: Joó fullname: Joó, Bálint organization: Thomas Jefferson National Accelerator Facility, Newport News, VA – sequence: 3 givenname: Dhiraj D. surname: Kalamkar fullname: Kalamkar, Dhiraj D. organization: Parallel Computing Lab, Intel Corporation, Bangalore, India – sequence: 4 givenname: Mikhail surname: Smelyanskiy fullname: Smelyanskiy, Mikhail organization: Parallel Computing Lab, Intel Corporation, Santa Clara, CA – sequence: 5 givenname: Karthikeyan surname: Vaidyanathan fullname: Vaidyanathan, Karthikeyan organization: Parallel Computing Lab, Intel Corporation, Bangalore, India – sequence: 6 givenname: Tilo surname: Wettig fullname: Wettig, Tilo organization: University of Regensburg, Germany – sequence: 7 givenname: Pradeep surname: Dubey fullname: Dubey, Pradeep organization: Parallel Computing Lab, Intel Corporation, Santa Clara, CA |
| BookMark | eNqFkM1Kw0AUhUesYFu7cesm4MpF6vwlmVlK_KsUVKrgbpjM3NDRJlMyAXHvk_gAPoSP4pM4pboWLly-ew4XzhmhQetbQOiQ4CkhWJ4uyinFhEfYQSPCCykzLiXb_YMMYzxAQ0ryIuWMyn00CeE5HgnPckKzIbqZ6753BpL78jx5df0ysb7Rrk0sGN-sfXC9820SZ9b2sPr6TJ4gwt3Sfb9_JMan684bCMF34QDt1XoVYPK7x-jx8uKhvE7nt1ez8myeasZ4n0JuRG2JEEwaQQqrK2p5RXGWWY6JpZprwziGrJAMJAgiqBZ1YUDjCteYszE62v51AKDWnWt096YKTGiMHtWTrapNoyrvX4IiWG3aUotSbdqKoKrOQR29x_972Q8L82fb |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/SC.2014.11 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1479954993 9781479955008 9781479954995 1479955000 |
| EndPage | 80 |
| ExternalDocumentID | 7012993 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK IERZE OCL RIE RIL 6IH AAWTH ABLEC ADZIZ CHZPO IPLJI |
| ID | FETCH-LOGICAL-a334t-e6c8fd18839c817dab2d4b2055d401d2a4ac340e5793e9e8182a8f7cea0b0f043 |
| IEDL.DBID | RIE |
| ISBN | 1479955000 9781479955008 |
| ISICitedReferencesCount | 22 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000393484400006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2167-4329 |
| IngestDate | Wed Aug 27 01:54:18 EDT 2025 Wed Jan 31 06:40:24 EST 2024 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Keywords | Xeon Phi domain decomposition Intel coprocessor lattice QCD |
| Language | English |
| LinkModel | DirectLink |
| MeetingName | SC '14: International Conference for High Performance Computing, Networking, Storage and Analysis |
| MergedId | FETCHMERGED-LOGICAL-a334t-e6c8fd18839c817dab2d4b2055d401d2a4ac340e5793e9e8182a8f7cea0b0f043 |
| OpenAccessLink | https://www.osti.gov/servlets/purl/1169292 |
| PageCount | 12 |
| ParticipantIDs | acm_books_10_1109_SC_2014_11_brief ieee_primary_7012993 acm_books_10_1109_SC_2014_11 |
| PublicationCentury | 2000 |
| PublicationDate | 20141116 2014-Nov. |
| PublicationDateYYYYMMDD | 2014-11-16 2014-11-01 |
| PublicationDate_xml | – month: 11 year: 2014 text: 20141116 day: 16 |
| PublicationDecade | 2010 |
| PublicationPlace | Piscataway, NJ, USA |
| PublicationPlace_xml | – name: Piscataway, NJ, USA |
| PublicationSeriesTitle | ACM Conferences |
| PublicationTitle | Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis |
| PublicationTitleAbbrev | SC |
| PublicationYear | 2014 |
| Publisher | IEEE Press IEEE |
| Publisher_xml | – name: IEEE Press – name: IEEE |
| SSID | ssj0001456125 ssj0001947932 ssj0003204180 |
| Score | 1.7410434 |
| Snippet | The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale... |
| SourceID | ieee acm |
| SourceType | Publisher |
| StartPage | 69 |
| SubjectTerms | and very la Applied computing -- Physical sciences and engineering -- Physics Computing methodologies -- Symbolic and algebraic manipulation -- Symbolic and algebraic algorithms -- Linear algebra algorithms Domain decomposition G.1.3 [Numerical Analysis]: Numerical Linear Algebra Sparse Gold Intel® Xeon Phi coprocessor Jacobian matrices Lattice QCD Categories and subject descriptors: D.3.4 [Programming Languages]: Processors Optimization Lattices Layout Linear systems Mathematics of computing -- Mathematical analysis -- Numerical analysis -- Computations on matrices Mathematics of computing -- Mathematical software Prefetching Software and its engineering -- Software notations and tools -- Compilers structured Vectors |
| Title | Lattice QCD with domain decomposition on Intel® Xeon Phi™ co-processors |
| URI | https://ieeexplore.ieee.org/document/7012993 |
| WOSCitedRecordID | wos000393484400006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LSsNAFL1YceGqPirWF4O4NHUyM-lk1qmPRSkVH3QXJpMbzKKNtKmf5Uf4Zc5MY0UQQcgiNyEQDhPu3JN77gG40JmShsssyKTIAsF0FCibFQJthJIRhqH2urXnoRyN4slEjTfgcq2FQUTffIY9d-r_5eeVWTqq7Eo61kTxFrSklCut1jef4nYCzdr0sXKcEVvHnFEReiM15kd9c6aaaaUhVVcPievyEj1nJNTSZvrDa8Wnmpv2_15yBzrfmj0yXmejXdjA2R60v0wbSPMN78PtUNeu4Y3cJwPiWFgyqKa6nJEBuvbypoeL2MMrTD7eyQRtMH4pSVIFja6gmi868HRz_ZjcBY2bQqA5F3WAfRMXeRjbHZGJQ5nrjOUiYzSKcltj5UwLbbigGFnYUKFN5EzHhTSoaUYLKvgBbM6qGR4CkVKxos-xHxmb3DTGLM9VqIrQBjEt-l04sRimrkxYpL7KoCp9SFIHsQ26cP7X7TSbl1h0Yd9hm76uhm6kDaxHv18-hm339EokeAKb9XyJp7Bl3upyMT_zK-UTBNW2Dw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3bSsNAEB28gT55x3pdxEeje0s3-5x6w1oqXuhb2Gwm2Acbaauf5Uf4Ze5uY0UQQchDJoEQDhtm52TOHIAjk2tlhcqjXMk8ktzEkXZZITJWahUjYybo1h7bqtNJej3dnYHjqRYGEUPzGZ740_Avv6jsq6fKTpVnTbSYhflYSs4maq1vRsXvBerVGWLtWSM-jQWnkgUrNR6GfQuu63mljOrTu9T3eckTbyU0a-zzD7eVkGzOl__3miuw8a3aI91pPlqFGRyswfKXbQOpv-J1uGibsW95I7dpi3gelrSqZ9MfkBb6BvO6i4u4I2hMPt5JD13QfeqTtIpqZUE1HG3Aw_nZfXoZ1X4KkRFCjiNs2qQsWOL2RDZhqjA5L2TOaRwXrsoquJHGCkkxdrChRpfKuUlKZdHQnJZUik2YG1QD3AKilOZlU2Azti69GUx4UWimS-aChJbNBuw6DDNfKIyyUGdQnd2lmYfYBQ04_Ot2lg_7WDZg3WObvUzGbmQ1rNu_Xz6Axcv7m3bWvupc78CSf9JEMrgLc-PhK-7Bgn0b90fD_bBqPgGxSrlW |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Lattice+QCD+with+domain+decomposition+on+Intel%C2%AE+Xeon+Phi%E2%84%A2+co-processors&rft.au=Heybrock%2C+Simon&rft.au=Jo%C3%B3%2C+B%C3%A1lint&rft.au=Kalamkar%2C+Dhiraj+D.&rft.au=Smelyanskiy%2C+Mikhail&rft.series=ACM+Conferences&rft.date=2014-11-16&rft.pub=IEEE+Press&rft.isbn=1479955000&rft.spage=69&rft.epage=80&rft_id=info:doi/10.1109%2FSC.2014.11 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-4329&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-4329&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-4329&client=summon |

