Lattice QCD with domain decomposition on Intel® Xeon Phi™ co-processors

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis s. 69 - 80
Hlavní autoři: Heybrock, Simon, Joó, Bálint, Kalamkar, Dhiraj D., Smelyanskiy, Mikhail, Vaidyanathan, Karthikeyan, Wettig, Tilo, Dubey, Pradeep
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: Piscataway, NJ, USA IEEE Press 16.11.2014
IEEE
Edice:ACM Conferences
Témata:
ISBN:1479955000, 9781479955008
ISSN:2167-4329
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel® Xeon Phi™ co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.
ISBN:1479955000
9781479955008
ISSN:2167-4329
DOI:10.1109/SC.2014.11