Communication-hiding programming for clusters with multi-coprocessor nodes

Summary Future exascale systems are expected to adopt compute nodes that incorporate many accelerators. To shed some light on the upcoming software challenge, this paper investigates the particular topic of programming clusters that have multiple Xeon Phi coprocessors in each compute node. A new off...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Concurrency and computation Ročník 27; číslo 16; s. 4172 - 4185
Hlavní autoři:	Dong, Xinnan, Wen, Mei, Chai, Jun, Cai, Xing, Zhao, Mandan, Zhang, Chunyuan
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Blackwell Publishing Ltd 01.11.2015
Témata:	Clusters Computation Computer programs Coprocessors hybrid programming Intel Xeon Phi coprocessor offload model Programming SCIF Sheds Symmetry Three dimensional Tianhe-2
ISSN:	1532-0626, 1532-0634
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Summary Future exascale systems are expected to adopt compute nodes that incorporate many accelerators. To shed some light on the upcoming software challenge, this paper investigates the particular topic of programming clusters that have multiple Xeon Phi coprocessors in each compute node. A new offload approach is considered for intra‐node communication, which combines Intel's APIs of coprocessor offload infrastructure (COI) and symmetric communication interface (SCIF) for achieving low latency. While the conventional pragma‐based offload approach allows simpler programming, the COI‐SCIF approach has three advantages in (1) lower overhead associated with launching offloaded code, (2) higher data transfer bandwidths, and (3) more advanced asynchrony between computation and data movement. The low‐level COI‐SCIF approach is also shown to have benefits over the MPI‐OpenMP counterpart, which belongs to the symmetric usage mode. Moreover, a hybird programming strategy based on COI‐SCIF is presented for joining the computational force of all CPUs and coprocessors, while realizing communication hiding. All the programming approaches are tested by a real‐world 3D application, for which the COI‐SCIF‐based approach shows a performance advantage on Tianhe‐2. Copyright © 2015 John Wiley & Sons, Ltd.
Bibliografie:	istex:1AC1249C406D5A6A6CFC73793C2B958E37760C5E ark:/67375/WNG-LNQSH4PP-B National 863 Program - No. 2012AA012706 National Natural Science Foundation of China - No. 61033008; No. 61272145 FRINATEK program of the Research Council of Norway - No. 214113 ArticleID:CPE3507 Innovation Program of NUDT Graduate School - No. B100603; No. B120605; No. CJ11-06-01 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1532-0626 1532-0634
DOI:	10.1002/cpe.3507