Parallel programming model for the Epiphany many-core coprocessor using threaded MPI

•We investigate the use of MPI for programming the Epiphany RISC array processor.•A threaded MPI implementation adapted for coprocessor offload is presented.•Existing MPI code for four scientific applications was re-used with minimal changes.•Demonstrated performance exceeds 12 GFLOPS with an effici...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Microprocessors and microsystems Jg. 43; S. 95 - 103
Hauptverfasser:	Ross, James A., Richie, David A., Park, Song J., Shires, Dale R.
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier B.V 01.06.2016
Schlagworte:	2D RISC array Adapteva Epiphany Algorithms Architecture Computing time Energy efficiency Many-core Message passing MPI NoC Parallel programming RISC Threaded Two dimensional Energy efficiency NoC Adapteva Epiphany 2D RISC array Many-core MPI
ISSN:	0141-9331, 1872-9436
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•We investigate the use of MPI for programming the Epiphany RISC array processor.•A threaded MPI implementation adapted for coprocessor offload is presented.•Existing MPI code for four scientific applications was re-used with minimal changes.•Demonstrated performance exceeds 12 GFLOPS with an efficiency over 20GFLOPS/W.•Threaded MPI exhibits the highest performance reported using a standard parallel API. The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard. Using MPI exploits the similarities between the Epiphany architecture and a conventional parallel distributed cluster of serial cores. Our approach enables MPI codes to execute on the RISC array processor with little modification and achieve high performance. We report benchmark results for the threaded MPI implementation of four algorithms (dense matrix–matrix multiplication, N-body particle interaction, five-point 2D stencil update, and 2D FFT) and highlight the importance of fast inter-core communication for the architecture.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0141-9331 1872-9436
DOI:	10.1016/j.micpro.2016.02.006