Parallel programming model for the Epiphany many-core coprocessor using threaded MPI

•We investigate the use of MPI for programming the Epiphany RISC array processor.•A threaded MPI implementation adapted for coprocessor offload is presented.•Existing MPI code for four scientific applications was re-used with minimal changes.•Demonstrated performance exceeds 12 GFLOPS with an effici...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Microprocessors and microsystems Jg. 43; S. 95 - 103
Hauptverfasser: Ross, James A., Richie, David A., Park, Song J., Shires, Dale R.
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.06.2016
Schlagworte:
ISSN:0141-9331, 1872-9436
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract •We investigate the use of MPI for programming the Epiphany RISC array processor.•A threaded MPI implementation adapted for coprocessor offload is presented.•Existing MPI code for four scientific applications was re-used with minimal changes.•Demonstrated performance exceeds 12 GFLOPS with an efficiency over 20GFLOPS/W.•Threaded MPI exhibits the highest performance reported using a standard parallel API. The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard. Using MPI exploits the similarities between the Epiphany architecture and a conventional parallel distributed cluster of serial cores. Our approach enables MPI codes to execute on the RISC array processor with little modification and achieve high performance. We report benchmark results for the threaded MPI implementation of four algorithms (dense matrix–matrix multiplication, N-body particle interaction, five-point 2D stencil update, and 2D FFT) and highlight the importance of fast inter-core communication for the architecture.
AbstractList The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard. Using MPI exploits the similarities between the Epiphany architecture and a conventional parallel distributed cluster of serial cores. Our approach enables MPI codes to execute on the RISC array processor with little modification and achieve high performance. We report benchmark results for the threaded MPI implementation of four algorithms (dense matrix-matrix multiplication, N-body particle interaction, five-point 2D stencil update, and 2D FFT) and highlight the importance of fast inter-core communication for the architecture.
•We investigate the use of MPI for programming the Epiphany RISC array processor.•A threaded MPI implementation adapted for coprocessor offload is presented.•Existing MPI code for four scientific applications was re-used with minimal changes.•Demonstrated performance exceeds 12 GFLOPS with an efficiency over 20GFLOPS/W.•Threaded MPI exhibits the highest performance reported using a standard parallel API. The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard. Using MPI exploits the similarities between the Epiphany architecture and a conventional parallel distributed cluster of serial cores. Our approach enables MPI codes to execute on the RISC array processor with little modification and achieve high performance. We report benchmark results for the threaded MPI implementation of four algorithms (dense matrix–matrix multiplication, N-body particle interaction, five-point 2D stencil update, and 2D FFT) and highlight the importance of fast inter-core communication for the architecture.
Author Park, Song J.
Ross, James A.
Shires, Dale R.
Richie, David A.
Author_xml – sequence: 1
  givenname: James A.
  surname: Ross
  fullname: Ross, James A.
  email: james.a.ross176.civ@mail.mil, james.a.ross@gmail.com
  organization: U.S. Army Research Laboratory, Aberdeen Proving Ground, MD, United States
– sequence: 2
  givenname: David A.
  surname: Richie
  fullname: Richie, David A.
  email: drichie@browndeertechnology.com
  organization: Brown Deer Technology, Forest Hill, MD, United States
– sequence: 3
  givenname: Song J.
  surname: Park
  fullname: Park, Song J.
  email: song.j.park.civ@mail.mil
  organization: U.S. Army Research Laboratory, Aberdeen Proving Ground, MD, United States
– sequence: 4
  givenname: Dale R.
  surname: Shires
  fullname: Shires, Dale R.
  email: dale.r.shires.civ@mail.mil
  organization: U.S. Army Research Laboratory, Aberdeen Proving Ground, MD, United States
BookMark eNqFkD1PwzAQhi1UJNrCP2DIyJJwjvPJgISqApWK6FBmy3HOraskDnaK1H-PqzAxwHKnO73P6fTMyKQzHRJySyGiQLP7Q9Rq2VsTxX6KII4AsgsypUUeh2XCsgmZAk1oWDJGr8jMuQMApJDFU7LdCCuaBpvA8zsr2lZ3u6A1td8oY4Nhj8Gy1_1edKeg9SWUxmIgjY9LdM5Hju6MDHuLosY6eNusrsmlEo3Dm58-Jx_Py-3iNVy_v6wWT-tQMlYOYV0ykUOCqkpTVeX-2xwrylSRpxnFSokqB6yygtFEANSVyFSBRc5imqhU1IrNyd141z_zeUQ38FY7iU0jOjRHx2lBM4gZK8BHkzEqrXHOouK91a2wJ06BnyXyAx8l8rNEDjH3Ej328AuTehCDNt1ghW7-gx9HGL2DL42WO6mxk1hri3LgtdF_H_gGL56TSg
CitedBy_id crossref_primary_10_1134_S1995080218090159
crossref_primary_10_1016_j_micpro_2016_05_002
Cites_doi 10.1109/JSSC.2007.910957
10.1006/jpdc.2000.1674
10.1145/1498765.1498785
10.1016/j.jocs.2015.04.023
10.1016/j.parco.2007.07.002
10.1016/j.procs.2013.05.333
10.1109/MM.2007.4378780
ContentType Journal Article
Copyright 2016
Copyright_xml – notice: 2016
DBID AAYXX
CITATION
7SC
7SP
8FD
F28
FR3
JQ2
L7M
L~C
L~D
DOI 10.1016/j.micpro.2016.02.006
DatabaseName CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
Architecture
EISSN 1872-9436
EndPage 103
ExternalDocumentID 10_1016_j_micpro_2016_02_006
S0141933116000375
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
123
1B1
1~.
1~5
29M
4.4
457
4G.
5VS
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAXUO
AAYFN
ABBOA
ABJNI
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFS
ACIWK
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
AXJTR
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LG9
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SET
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
T9H
TN5
UHS
WUQ
XOL
XPP
ZMT
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
7SC
7SP
8FD
F28
FR3
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c339t-d93a704efb55fb78727eb13f87561ebfab70eb68314a00dba6f8e873214f5adf3
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000377740500009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0141-9331
IngestDate Thu Oct 02 10:27:07 EDT 2025
Sat Nov 29 05:51:32 EST 2025
Tue Nov 18 22:27:45 EST 2025
Fri Feb 23 02:26:34 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Energy efficiency
NoC
Adapteva Epiphany
2D RISC array
Many-core
MPI
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c339t-d93a704efb55fb78727eb13f87561ebfab70eb68314a00dba6f8e873214f5adf3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PQID 1816023380
PQPubID 23500
PageCount 9
ParticipantIDs proquest_miscellaneous_1816023380
crossref_primary_10_1016_j_micpro_2016_02_006
crossref_citationtrail_10_1016_j_micpro_2016_02_006
elsevier_sciencedirect_doi_10_1016_j_micpro_2016_02_006
PublicationCentury 2000
PublicationDate June 2016
2016-06-00
20160601
PublicationDateYYYYMMDD 2016-06-01
PublicationDate_xml – month: 06
  year: 2016
  text: June 2016
PublicationDecade 2010
PublicationTitle Microprocessors and microsystems
PublicationYear 2016
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References 2016. (accessed 25.03.15).
Vangal, Howard, Ruhl, Dighe, Wilson, Tschanz, Finan, Singh, Jacob, Jain, Erraguntla, Roberts, Hoskote, Borkar, Borkar (bib0017) 2008; 43
B. Webber, “The Apl to C compiler aplc is now ported to the Parallella.” Available [Online]
Gebrewahid, Yang, Cedersjo, Ul-Abdin, Gaspes, Janneck, Svensson (bib0023) 2014
W. Gropp, T. Hoefler, R. Thakur, and E. Lusk, “N-body program using pipelining algorithm.” Available [Online]
Balaji, Buntinas, Goodell, Gropp, Thakur (bib0009) 2008
(accessed 30.06.15).
Intel
(accessed 26.06.15).
Melpignano, Benini, Flamand, Jego, Lepley, Haugou, Clermidy, Dutoit (bib0033) 2012
MPI ‘lite’ proof-of-concept, Parallella Community. Available [online]
Mattson, Riepen, Lehnig, Brett, Haas, Kennedy, Howard, Vangal, Borkar, Ruhl, others (bib0039) 2010
(accessed 25.03.15).
Howard, Dighe, Hoskote, Vangal, Finan, Ruhl, Jenkins, Wilson, Borkar, Schrom, others (bib0028) 2010
(accessed 23.03.15).
Marongiu, Capotondi, Tagliavini, Benini (bib0034) 2013
and DGEMM
“E16G301 Epiphany 16-core Microprocessor,” Adapteva Inc., Lexington, MA. Datasheet Rev. 14.03.11.
(accessed 29.10.15).
“Epiphany Architecture Reference,” Adapteva, Rev. 14.03.11.
“Matrix–matrix multiply.” Available [Online]
Protopopov, Skjellum (bib0036) 2001; 61
Gropp, Thakur (bib0008) Sep. 2007; 33
Butts, Jones, Wasson (bib0031) 2007
“Epiphany SDK Reference,” Adapteva Inc.
“Parallella-1.x Reference Manual,” Adapteva, Boston Design Solutions, Ant Micro. Rev. 14.09.09.
Ross, Richie, Park, Shires (bib0006) 2015
Xeon PhiTM Coprocessor SGEMM
Taylor, Lee, Miller, Wentzlaff, Bratt, Greenwald, Hoffmann, Johnson, Kim, Psota, others (bib0030) 2004
Threaded MPI examples for the Adapteva Epiphany architecture, GitHub. Available [Online]
Ul-Abdin, Ahlander, Svensson (bib0026) 2013
Richie, Ross, Park, Shires (bib0005) 2015; 9
Throughput, Intel. Available [Online]
Wentzlaff, Griffin, Hoffmann, Bao, Edwards, Ramey, Mattina, Miao, Brown, Agarwal (bib0029) Sep. 2007; 27
Richie (bib0011) 2013
Demaine (bib0037) 1997
(accessed 26.03.15).
Sapir (bib0025) 2012
Daya, Chen, Subramanian, Kwon, Park, Krishna, Holt, Chandrakasan, Peh (bib0032) 2014
Varghese, Edwards, Mitra, Rendell (bib0015) 2014
Malvoni, Knezovic (bib0027) 2014
Available [Online]
de Dinechin, de Massas, Lager, Léger, Orgogozo, Reybert, Strudel (bib0035) 2013; 18
“Erlang-OTP and the Parallella Board,” March 2015. Available [Online]
Sapir (bib0024) 2012
M. Frigo and S. Johnson, “FFT Benchmark Methodology,”
Olofsson, Nordström, Ul-Abdin (bib0004) 2014
Ureña, Riepen, Konow (bib0040) 2011
Shen, Tang, Yang (bib0038) 1999
Williams, Waterman, Patterson (bib0013) 2009; 52
Clauss, Lankes, Reble, Bemmerl (bib0010) 2011
Mattson (10.1016/j.micpro.2016.02.006_bib0039) 2010
Williams (10.1016/j.micpro.2016.02.006_bib0013) 2009; 52
Wentzlaff (10.1016/j.micpro.2016.02.006_bib0029) 2007; 27
10.1016/j.micpro.2016.02.006_bib0016
Ureña (10.1016/j.micpro.2016.02.006_bib0040) 2011
10.1016/j.micpro.2016.02.006_bib0014
10.1016/j.micpro.2016.02.006_bib0012
de Dinechin (10.1016/j.micpro.2016.02.006_bib0035) 2013; 18
Demaine (10.1016/j.micpro.2016.02.006_bib0037) 1997
Clauss (10.1016/j.micpro.2016.02.006_bib0010) 2011
10.1016/j.micpro.2016.02.006_bib0019
10.1016/j.micpro.2016.02.006_bib0018
Sapir (10.1016/j.micpro.2016.02.006_bib0025) 2012
Richie (10.1016/j.micpro.2016.02.006_bib0005) 2015; 9
Gropp (10.1016/j.micpro.2016.02.006_bib0008) 2007; 33
Varghese (10.1016/j.micpro.2016.02.006_bib0015) 2014
Protopopov (10.1016/j.micpro.2016.02.006_bib0036) 2001; 61
10.1016/j.micpro.2016.02.006_bib0020
Daya (10.1016/j.micpro.2016.02.006_bib0032) 2014
Ross (10.1016/j.micpro.2016.02.006_bib0006) 2015
Malvoni (10.1016/j.micpro.2016.02.006_bib0027) 2014
10.1016/j.micpro.2016.02.006_bib0003
10.1016/j.micpro.2016.02.006_bib0002
Olofsson (10.1016/j.micpro.2016.02.006_bib0004) 2014
Balaji (10.1016/j.micpro.2016.02.006_bib0009) 2008
Melpignano (10.1016/j.micpro.2016.02.006_bib0033) 2012
10.1016/j.micpro.2016.02.006_bib0001
Shen (10.1016/j.micpro.2016.02.006_bib0038) 1999
10.1016/j.micpro.2016.02.006_bib0022
10.1016/j.micpro.2016.02.006_bib0021
Taylor (10.1016/j.micpro.2016.02.006_bib0030) 2004
10.1016/j.micpro.2016.02.006_bib0007
Richie (10.1016/j.micpro.2016.02.006_bib0011) 2013
Gebrewahid (10.1016/j.micpro.2016.02.006_bib0023) 2014
Vangal (10.1016/j.micpro.2016.02.006_bib0017) 2008; 43
Ul-Abdin (10.1016/j.micpro.2016.02.006_bib0026) 2013
Sapir (10.1016/j.micpro.2016.02.006_bib0024) 2012
Howard (10.1016/j.micpro.2016.02.006_bib0028) 2010
Butts (10.1016/j.micpro.2016.02.006_bib0031) 2007
Marongiu (10.1016/j.micpro.2016.02.006_bib0034) 2013
References_xml – reference: 2016. (accessed 25.03.15).
– year: 2014
  ident: bib0027
  article-title: Are your passwords safe: energy-efficient Bcrypt cracking with low-cost parallel hardware
  publication-title: Proceedings of the 8th USENIX conference on Offensive Technologies (WOOT’14)
– volume: 27
  start-page: 15
  year: Sep. 2007
  end-page: 31
  ident: bib0029
  article-title: On-chip interconnection architecture of the tile processor
  publication-title: IEEE Micro
– reference: “Epiphany SDK Reference,” Adapteva Inc.
– reference: “Parallella-1.x Reference Manual,” Adapteva, Boston Design Solutions, Ant Micro. Rev. 14.09.09.
– reference: . (accessed 23.03.15).
– start-page: 330
  year: 2013
  end-page: 338
  ident: bib0026
  article-title: Energy-efficient synthetic-aperture radar processing on a manycore architecture
  publication-title: Proceedings of the 2013 42nd International Conference on Parallel Processing (ICPP’13)
– start-page: 41
  year: 2015
  end-page: 47
  ident: bib0006
  article-title: Parallel programming model for the Epiphany many-core coprocessor using threaded MPI
  publication-title: Proceedings of the 3rd International Workshop on Many-core Embedded Systems (MES’15)
– year: 2013
  ident: bib0011
  article-title: COPRTHR API Reference
– start-page: 49
  year: 1999
  ident: bib0038
  article-title: Adaptive two-level thread management for fast MPI execution on shared memory machines
  publication-title: Proceedings of the 1999 ACM/IEEE conference on Supercomputing
– start-page: 208
  year: 2011
  end-page: 217
  ident: bib0040
  article-title: RCKMPI–lightweight MPI implementation for Intel's Single-chip Cloud Computer (SCC)
  publication-title: Proceedings of the 18th European MPI Users’ Group Conference on Recent Advances in the Message Passing Interface (EuroMPI’11)
– start-page: 525
  year: 2011
  end-page: 532
  ident: bib0010
  article-title: Evaluation and improvements of programming models for the Intel SCC many-core processor
  publication-title: Proceedings of the International Conference on High Performance Computing and Simulation (HPCS)
– start-page: 120
  year: 2008
  end-page: 129
  ident: bib0009
  article-title: Toward efficient support for multithreaded MPI communication
  publication-title: Recent Advances in Parallel Virtual Machine and Message Passing Interface
– reference: . (accessed 26.06.15).
– start-page: 25
  year: 2014
  end-page: 36
  ident: bib0032
  article-title: SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering
  publication-title: Proceedings of the 41st Annual International Symposium on Computer Architecture (ISCA’14)
– start-page: 1137
  year: 2012
  end-page: 1142
  ident: bib0033
  article-title: Platform 2012, a many-core computing accelerator for embedded SoCs: performance evaluation of visual analytics applications
  publication-title: Proceedings of the 49th Annual Design Automation Conference (DAC’12)
– reference: and DGEMM
– reference: “Erlang-OTP and the Parallella Board,” March 2015. Available [Online]:
– reference: B. Webber, “The Apl to C compiler aplc is now ported to the Parallella.” Available [Online]:
– reference: MPI ‘lite’ proof-of-concept, Parallella Community. Available [online]:
– start-page: 2
  year: 2004
  end-page: 13
  ident: bib0030
  article-title: Evaluation of the RAW microprocessor: An exposed-wire-delay architecture for ILP and streams
  publication-title: Proceedings of the 31st Annual International Symposium on Computer Architecture, ISCA’04
– year: 2012
  ident: bib0025
  article-title: Using a Scalable Parallel 2D FFT for Image Enhancement. White Paper
– start-page: 1719
  year: 2014
  end-page: 1726
  ident: bib0004
  article-title: “Kickstarting high-performance energy-efficient manycore architectures with Epiphany”
  publication-title: Asilomar Conference on Signals, Systems and Computers
– reference: W. Gropp, T. Hoefler, R. Thakur, and E. Lusk, “N-body program using pipelining algorithm.” Available [Online]:
– start-page: 153
  year: 1997
  end-page: 163
  ident: bib0037
  article-title: A threads-only MPI implementation for the development of parallel programs
  publication-title: Proceedings of the 11th International Symposium on High Performance Computing Systems
– volume: 52
  start-page: 65
  year: 2009
  end-page: 76
  ident: bib0013
  article-title: Roofline: an insightful visual performance model for multicore architectures
  publication-title: Commun. ACM
– start-page: 55
  year: 2007
  end-page: 64
  ident: bib0031
  article-title: A structural object programming model, architecture, chip and tools for reconfigurable computing
  publication-title: Proceedings of the Field-Programmable Custom Computing Machines (FCCM)
– start-page: 108
  year: 2010
  end-page: 109
  ident: bib0028
  article-title: A 48-core IA-32 message-passing processor with DVFS in 45
  publication-title: Proceedings of the International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)
– start-page: 984
  year: 2014
  end-page: 992
  ident: bib0015
  article-title: Programming the Adapteva Epiphany 64-core network-on-chip coprocessor
  publication-title: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW’14)
– reference: . (accessed 29.10.15).
– volume: 33
  start-page: 595
  year: Sep. 2007
  end-page: 604
  ident: bib0008
  article-title: Thread-safety in an MPI implementation: requirements and analysis
  publication-title: Parallel Comput.
– reference: “Threaded MPI examples for the Adapteva Epiphany architecture, GitHub. Available [Online]:
– start-page: 1
  year: 2013
  end-page: 8
  ident: bib0034
  article-title: Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP
  publication-title: Proceedings of the First International Workshop on Many-core Embedded Systems (MES’13)
– reference: “Epiphany Architecture Reference,” Adapteva, Rev. 14.03.11.
– volume: 9
  start-page: 94
  year: 2015
  end-page: 100
  ident: bib0005
  article-title: Threaded MPI programming model for the Epiphany RISC array processor
  publication-title: J. Comput. Sci.
– volume: 18
  start-page: 1654
  year: 2013
  end-page: 1663
  ident: bib0035
  article-title: A distributed run-time environment for the Kalray MPPA
  publication-title: Proc. Comput. Sci.
– year: 2012
  ident: bib0024
  article-title: Scalable Parallel Multiplication of Big Matrices. White Paper
– reference: “Matrix–matrix multiply.” Available [Online]:
– reference: M. Frigo and S. Johnson, “FFT Benchmark Methodology,”
– reference: . Available [Online]:
– reference: Throughput, Intel. Available [Online]:
– reference: “E16G301 Epiphany 16-core Microprocessor,” Adapteva Inc., Lexington, MA. Datasheet Rev. 14.03.11.
– reference: . (accessed 25.03.15).
– reference: Xeon PhiTM Coprocessor SGEMM
– reference: . (accessed 30.06.15).
– volume: 43
  start-page: 29
  year: 2008
  end-page: 41
  ident: bib0017
  article-title: An 80-tile sub-100-W teraFLOPS processor in 65-nm CMOS
  publication-title: IEEE J. Solid-State Circuits
– start-page: 1
  year: 2010
  end-page: 11
  ident: bib0039
  article-title: The 48-core SCC processor: the programmer's view
  publication-title: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’10)
– reference: “Intel
– start-page: 321
  year: 2014
  end-page: 328
  ident: bib0023
  article-title: Realizing efficient execution of dataflow actors on manycores
  publication-title: Proceedings of the 12th IEEE International Conference on Embedded and Ubiquitous Computing (EUC’14)
– volume: 61
  start-page: 449
  year: 2001
  end-page: 466
  ident: bib0036
  article-title: A multithreaded message passing interface (MPI) architecture: performance and program issues
  publication-title: J. Parallel Distrib. Comput.
– reference: . (accessed 26.03.15).
– start-page: 2
  year: 2004
  ident: 10.1016/j.micpro.2016.02.006_bib0030
  article-title: Evaluation of the RAW microprocessor: An exposed-wire-delay architecture for ILP and streams
– volume: 43
  start-page: 29
  issue: 1
  year: 2008
  ident: 10.1016/j.micpro.2016.02.006_bib0017
  article-title: An 80-tile sub-100-W teraFLOPS processor in 65-nm CMOS
  publication-title: IEEE J. Solid-State Circuits
  doi: 10.1109/JSSC.2007.910957
– start-page: 55
  year: 2007
  ident: 10.1016/j.micpro.2016.02.006_bib0031
  article-title: A structural object programming model, architecture, chip and tools for reconfigurable computing
– start-page: 321
  year: 2014
  ident: 10.1016/j.micpro.2016.02.006_bib0023
  article-title: Realizing efficient execution of dataflow actors on manycores
– volume: 61
  start-page: 449
  issue: 4
  year: 2001
  ident: 10.1016/j.micpro.2016.02.006_bib0036
  article-title: A multithreaded message passing interface (MPI) architecture: performance and program issues
  publication-title: J. Parallel Distrib. Comput.
  doi: 10.1006/jpdc.2000.1674
– volume: 52
  start-page: 65
  issue: 4
  year: 2009
  ident: 10.1016/j.micpro.2016.02.006_bib0013
  article-title: Roofline: an insightful visual performance model for multicore architectures
  publication-title: Commun. ACM
  doi: 10.1145/1498765.1498785
– start-page: 984
  year: 2014
  ident: 10.1016/j.micpro.2016.02.006_bib0015
  article-title: Programming the Adapteva Epiphany 64-core network-on-chip coprocessor
– volume: 9
  start-page: 94
  year: 2015
  ident: 10.1016/j.micpro.2016.02.006_bib0005
  article-title: Threaded MPI programming model for the Epiphany RISC array processor
  publication-title: J. Comput. Sci.
  doi: 10.1016/j.jocs.2015.04.023
– ident: 10.1016/j.micpro.2016.02.006_bib0016
– ident: 10.1016/j.micpro.2016.02.006_bib0018
– start-page: 330
  year: 2013
  ident: 10.1016/j.micpro.2016.02.006_bib0026
  article-title: Energy-efficient synthetic-aperture radar processing on a manycore architecture
– start-page: 1137
  year: 2012
  ident: 10.1016/j.micpro.2016.02.006_bib0033
  article-title: Platform 2012, a many-core computing accelerator for embedded SoCs: performance evaluation of visual analytics applications
– start-page: 1
  year: 2013
  ident: 10.1016/j.micpro.2016.02.006_bib0034
  article-title: Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP
– volume: 33
  start-page: 595
  issue: 9
  year: 2007
  ident: 10.1016/j.micpro.2016.02.006_bib0008
  article-title: Thread-safety in an MPI implementation: requirements and analysis
  publication-title: Parallel Comput.
  doi: 10.1016/j.parco.2007.07.002
– ident: 10.1016/j.micpro.2016.02.006_bib0002
– ident: 10.1016/j.micpro.2016.02.006_bib0021
– start-page: 1
  year: 2010
  ident: 10.1016/j.micpro.2016.02.006_bib0039
  article-title: The 48-core SCC processor: the programmer's view
– year: 2013
  ident: 10.1016/j.micpro.2016.02.006_bib0011
– start-page: 525
  year: 2011
  ident: 10.1016/j.micpro.2016.02.006_bib0010
  article-title: Evaluation and improvements of programming models for the Intel SCC many-core processor
– start-page: 120
  year: 2008
  ident: 10.1016/j.micpro.2016.02.006_bib0009
  article-title: Toward efficient support for multithreaded MPI communication
– volume: 18
  start-page: 1654
  year: 2013
  ident: 10.1016/j.micpro.2016.02.006_bib0035
  article-title: A distributed run-time environment for the Kalray MPPA®-256 integrated manycore processor
  publication-title: Proc. Comput. Sci.
  doi: 10.1016/j.procs.2013.05.333
– year: 2012
  ident: 10.1016/j.micpro.2016.02.006_bib0024
– volume: 27
  start-page: 15
  issue: 5
  year: 2007
  ident: 10.1016/j.micpro.2016.02.006_bib0029
  article-title: On-chip interconnection architecture of the tile processor
  publication-title: IEEE Micro
  doi: 10.1109/MM.2007.4378780
– ident: 10.1016/j.micpro.2016.02.006_bib0014
– start-page: 153
  year: 1997
  ident: 10.1016/j.micpro.2016.02.006_bib0037
  article-title: A threads-only MPI implementation for the development of parallel programs
– ident: 10.1016/j.micpro.2016.02.006_bib0012
– year: 2014
  ident: 10.1016/j.micpro.2016.02.006_bib0027
  article-title: Are your passwords safe: energy-efficient Bcrypt cracking with low-cost parallel hardware
– start-page: 25
  year: 2014
  ident: 10.1016/j.micpro.2016.02.006_bib0032
  article-title: SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering
– start-page: 1719
  year: 2014
  ident: 10.1016/j.micpro.2016.02.006_bib0004
  article-title: “Kickstarting high-performance energy-efficient manycore architectures with Epiphany”
– ident: 10.1016/j.micpro.2016.02.006_bib0019
– start-page: 49
  year: 1999
  ident: 10.1016/j.micpro.2016.02.006_bib0038
  article-title: Adaptive two-level thread management for fast MPI execution on shared memory machines
– start-page: 208
  year: 2011
  ident: 10.1016/j.micpro.2016.02.006_bib0040
  article-title: RCKMPI–lightweight MPI implementation for Intel's Single-chip Cloud Computer (SCC)
– year: 2012
  ident: 10.1016/j.micpro.2016.02.006_bib0025
– start-page: 108
  year: 2010
  ident: 10.1016/j.micpro.2016.02.006_bib0028
  article-title: A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS
– start-page: 41
  year: 2015
  ident: 10.1016/j.micpro.2016.02.006_bib0006
  article-title: Parallel programming model for the Epiphany many-core coprocessor using threaded MPI
– ident: 10.1016/j.micpro.2016.02.006_bib0007
– ident: 10.1016/j.micpro.2016.02.006_bib0001
– ident: 10.1016/j.micpro.2016.02.006_bib0022
– ident: 10.1016/j.micpro.2016.02.006_bib0003
– ident: 10.1016/j.micpro.2016.02.006_bib0020
SSID ssj0005062
Score 2.0792997
Snippet •We investigate the use of MPI for programming the Epiphany RISC array processor.•A threaded MPI implementation adapted for coprocessor offload is...
The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 95
SubjectTerms 2D RISC array
Adapteva Epiphany
Algorithms
Architecture
Computing time
Energy efficiency
Many-core
Message passing
MPI
NoC
Parallel programming
RISC
Threaded
Two dimensional
Title Parallel programming model for the Epiphany many-core coprocessor using threaded MPI
URI https://dx.doi.org/10.1016/j.micpro.2016.02.006
https://www.proquest.com/docview/1816023380
Volume 43
WOSCitedRecordID wos000377740500009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-9436
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005062
  issn: 0141-9331
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3da9swEBch3cNe9j3Wjw0N9hYcZMu2rMdQMtaylrBmkDcj2RZNSRwTJ6X_w_7pnixLMf1Y14e9iCDLwvL9ojuffneH0DcQal5QIT0CrQf6mnhSSu2Gk3nAiAoykzL_Jzs_T2YzPun1_thYmOsFK8vk5oZX_1XU0AfC1qGzzxC3mxQ64DcIHVoQO7T_JPiJWOv6KAtLvVpqZ0BT8MZRCsfVvLrUB_1LaDydyHKQrSoTMgBDtrWJoVprmnM-OJucdE3YM03hc4NNiuel7qs7yc9Nzcba8XAHo6Hrn2eX5lCk4dN3rkxa3vaFrn906rovtP_cWPugzQa_hl1PhR_vGFXOeel73EZotbtvSDvbp6m32Spiv0l-cH-PN-6GqyEsDRar2XmxSbv6QErtO6rOERAtt-0qNbOkepaUBGmTvn0vYBFP-mhvdDKene4YQ6SpT-uWYSMxG7rg_ad5zNK5o_MbQ2b6Br1qv0DwyCDnLeoV5Tv02lb3wO1m_x5NLZBwB0i4ARIGIGEAErZAwg5IuAMk3AAJWyBhANIH9Pv7eHr8w2trcHgZpXzj5ZwKRsJCyShSEnb3gIF2pwo-c2O_kEpIRgoZJ9QPBSG5FLFKioTp8lcqErmiH1G_XJXFJ4T9TPEwE0rxKA95zkUYc6Zgp8glqI0s2EfUvq80axPU6zopi_Rv0tpHnrurMglanhjPrCjS1sg0xmMK-Hrizq9WcnC11gdroixW2zoFKzkG25cm5OCZT3OIXu7-KUeov1lvi8_oRXa9mdfrLy0AbwEObK5b
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parallel+programming+model+for+the+Epiphany+many-core+coprocessor+using+threaded+MPI&rft.jtitle=Microprocessors+and+microsystems&rft.au=Ross%2C+James+A.&rft.au=Richie%2C+David+A.&rft.au=Park%2C+Song+J.&rft.au=Shires%2C+Dale+R.&rft.date=2016-06-01&rft.issn=0141-9331&rft.volume=43&rft.spage=95&rft.epage=103&rft_id=info:doi/10.1016%2Fj.micpro.2016.02.006&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_micpro_2016_02_006
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0141-9331&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0141-9331&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0141-9331&client=summon