Parallel programming model for the Epiphany many-core coprocessor using threaded MPI

•We investigate the use of MPI for programming the Epiphany RISC array processor.•A threaded MPI implementation adapted for coprocessor offload is presented.•Existing MPI code for four scientific applications was re-used with minimal changes.•Demonstrated performance exceeds 12 GFLOPS with an effici...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Microprocessors and microsystems Jg. 43; S. 95 - 103
Hauptverfasser:	Ross, James A., Richie, David A., Park, Song J., Shires, Dale R.
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier B.V 01.06.2016
Schlagworte:	2D RISC array Adapteva Epiphany Algorithms Architecture Computing time Energy efficiency Many-core Message passing MPI NoC Parallel programming RISC Threaded Two dimensional Energy efficiency NoC Adapteva Epiphany 2D RISC array Many-core MPI
ISSN:	0141-9331, 1872-9436
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	•We investigate the use of MPI for programming the Epiphany RISC array processor.•A threaded MPI implementation adapted for coprocessor offload is presented.•Existing MPI code for four scientific applications was re-used with minimal changes.•Demonstrated performance exceeds 12 GFLOPS with an efficiency over 20GFLOPS/W.•Threaded MPI exhibits the highest performance reported using a standard parallel API. The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard. Using MPI exploits the similarities between the Epiphany architecture and a conventional parallel distributed cluster of serial cores. Our approach enables MPI codes to execute on the RISC array processor with little modification and achieve high performance. We report benchmark results for the threaded MPI implementation of four algorithms (dense matrix–matrix multiplication, N-body particle interaction, five-point 2D stencil update, and 2D FFT) and highlight the importance of fast inter-core communication for the architecture.
AbstractList	The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard. Using MPI exploits the similarities between the Epiphany architecture and a conventional parallel distributed cluster of serial cores. Our approach enables MPI codes to execute on the RISC array processor with little modification and achieve high performance. We report benchmark results for the threaded MPI implementation of four algorithms (dense matrix-matrix multiplication, N-body particle interaction, five-point 2D stencil update, and 2D FFT) and highlight the importance of fast inter-core communication for the architecture. •We investigate the use of MPI for programming the Epiphany RISC array processor.•A threaded MPI implementation adapted for coprocessor offload is presented.•Existing MPI code for four scientific applications was re-used with minimal changes.•Demonstrated performance exceeds 12 GFLOPS with an efficiency over 20GFLOPS/W.•Threaded MPI exhibits the highest performance reported using a standard parallel API. The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard. Using MPI exploits the similarities between the Epiphany architecture and a conventional parallel distributed cluster of serial cores. Our approach enables MPI codes to execute on the RISC array processor with little modification and achieve high performance. We report benchmark results for the threaded MPI implementation of four algorithms (dense matrix–matrix multiplication, N-body particle interaction, five-point 2D stencil update, and 2D FFT) and highlight the importance of fast inter-core communication for the architecture.
Author	Park, Song J. Ross, James A. Shires, Dale R. Richie, David A.
Author_xml	– sequence: 1 givenname: James A. surname: Ross fullname: Ross, James A. email: james.a.ross176.civ@mail.mil, james.a.ross@gmail.com organization: U.S. Army Research Laboratory, Aberdeen Proving Ground, MD, United States – sequence: 2 givenname: David A. surname: Richie fullname: Richie, David A. email: drichie@browndeertechnology.com organization: Brown Deer Technology, Forest Hill, MD, United States – sequence: 3 givenname: Song J. surname: Park fullname: Park, Song J. email: song.j.park.civ@mail.mil organization: U.S. Army Research Laboratory, Aberdeen Proving Ground, MD, United States – sequence: 4 givenname: Dale R. surname: Shires fullname: Shires, Dale R. email: dale.r.shires.civ@mail.mil organization: U.S. Army Research Laboratory, Aberdeen Proving Ground, MD, United States
BookMark	eNqFkD1PwzAQhi1UJNrCP2DIyJJwjvPJgISqApWK6FBmy3HOraskDnaK1H-PqzAxwHKnO73P6fTMyKQzHRJySyGiQLP7Q9Rq2VsTxX6KII4AsgsypUUeh2XCsgmZAk1oWDJGr8jMuQMApJDFU7LdCCuaBpvA8zsr2lZ3u6A1td8oY4Nhj8Gy1_1edKeg9SWUxmIgjY9LdM5Hju6MDHuLosY6eNusrsmlEo3Dm58-Jx_Py-3iNVy_v6wWT-tQMlYOYV0ykUOCqkpTVeX-2xwrylSRpxnFSokqB6yygtFEANSVyFSBRc5imqhU1IrNyd141z_zeUQ38FY7iU0jOjRHx2lBM4gZK8BHkzEqrXHOouK91a2wJ06BnyXyAx8l8rNEDjH3Ej328AuTehCDNt1ghW7-gx9HGL2DL42WO6mxk1hri3LgtdF_H_gGL56TSg
CitedBy_id	crossref_primary_10_1134_S1995080218090159 crossref_primary_10_1016_j_micpro_2016_05_002
Cites_doi	10.1109/JSSC.2007.910957 10.1006/jpdc.2000.1674 10.1145/1498765.1498785 10.1016/j.jocs.2015.04.023 10.1016/j.parco.2007.07.002 10.1016/j.procs.2013.05.333 10.1109/MM.2007.4378780
ContentType	Journal Article
Copyright	2016
Copyright_xml	– notice: 2016
DBID	AAYXX CITATION 7SC 7SP 8FD F28 FR3 JQ2 L7M L~C L~D
DOI	10.1016/j.micpro.2016.02.006
DatabaseName	CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science Architecture
EISSN	1872-9436
EndPage	103
ExternalDocumentID	10_1016_j_micpro_2016_02_006 S0141933116000375
GroupedDBID	--K --M -~X .DC .~1 0R~ 123 1B1 1~. 1~5 29M 4.4 457 4G. 5VS 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAXUO AAYFN ABBOA ABJNI ABMAC ABXDB ABYKQ ACDAQ ACGFS ACIWK ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD AXJTR BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ IHE J1W JJJVA KOM LG9 LY7 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 RIG ROL RPZ SBC SDF SDG SDP SES SET SEW SPC SPCBC SST SSV SSZ T5K T9H TN5 UHS WUQ XOL XPP ZMT ~G- 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD 7SC 7SP 8FD F28 FR3 JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c339t-d93a704efb55fb78727eb13f87561ebfab70eb68314a00dba6f8e873214f5adf3
ISICitedReferencesCount	7
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000377740500009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	0141-9331
IngestDate	Thu Oct 02 10:27:07 EDT 2025 Sat Nov 29 05:51:32 EST 2025 Tue Nov 18 22:27:45 EST 2025 Fri Feb 23 02:26:34 EST 2024
IsPeerReviewed	true
IsScholarly	true
Keywords	Energy efficiency NoC Adapteva Epiphany 2D RISC array Many-core MPI
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c339t-d93a704efb55fb78727eb13f87561ebfab70eb68314a00dba6f8e873214f5adf3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
PQID	1816023380
PQPubID	23500
PageCount	9
ParticipantIDs	proquest_miscellaneous_1816023380 crossref_primary_10_1016_j_micpro_2016_02_006 crossref_citationtrail_10_1016_j_micpro_2016_02_006 elsevier_sciencedirect_doi_10_1016_j_micpro_2016_02_006
PublicationCentury	2000
PublicationDate	June 2016 2016-06-00 20160601
PublicationDateYYYYMMDD	2016-06-01
PublicationDate_xml	– month: 06 year: 2016 text: June 2016
PublicationDecade	2010
PublicationTitle	Microprocessors and microsystems
PublicationYear	2016
Publisher	Elsevier B.V
Publisher_xml	– name: Elsevier B.V
References	2016. (accessed 25.03.15). Vangal, Howard, Ruhl, Dighe, Wilson, Tschanz, Finan, Singh, Jacob, Jain, Erraguntla, Roberts, Hoskote, Borkar, Borkar (bib0017) 2008; 43 B. Webber, “The Apl to C compiler aplc is now ported to the Parallella.” Available [Online] Gebrewahid, Yang, Cedersjo, Ul-Abdin, Gaspes, Janneck, Svensson (bib0023) 2014 W. Gropp, T. Hoefler, R. Thakur, and E. Lusk, “N-body program using pipelining algorithm.” Available [Online] Balaji, Buntinas, Goodell, Gropp, Thakur (bib0009) 2008 (accessed 30.06.15). Intel (accessed 26.06.15). Melpignano, Benini, Flamand, Jego, Lepley, Haugou, Clermidy, Dutoit (bib0033) 2012 MPI ‘lite’ proof-of-concept, Parallella Community. Available [online] Mattson, Riepen, Lehnig, Brett, Haas, Kennedy, Howard, Vangal, Borkar, Ruhl, others (bib0039) 2010 (accessed 25.03.15). Howard, Dighe, Hoskote, Vangal, Finan, Ruhl, Jenkins, Wilson, Borkar, Schrom, others (bib0028) 2010 (accessed 23.03.15). Marongiu, Capotondi, Tagliavini, Benini (bib0034) 2013 and DGEMM “E16G301 Epiphany 16-core Microprocessor,” Adapteva Inc., Lexington, MA. Datasheet Rev. 14.03.11. (accessed 29.10.15). “Epiphany Architecture Reference,” Adapteva, Rev. 14.03.11. “Matrix–matrix multiply.” Available [Online] Protopopov, Skjellum (bib0036) 2001; 61 Gropp, Thakur (bib0008) Sep. 2007; 33 Butts, Jones, Wasson (bib0031) 2007 “Epiphany SDK Reference,” Adapteva Inc. “Parallella-1.x Reference Manual,” Adapteva, Boston Design Solutions, Ant Micro. Rev. 14.09.09. Ross, Richie, Park, Shires (bib0006) 2015 Xeon PhiTM Coprocessor SGEMM Taylor, Lee, Miller, Wentzlaff, Bratt, Greenwald, Hoffmann, Johnson, Kim, Psota, others (bib0030) 2004 Threaded MPI examples for the Adapteva Epiphany architecture, GitHub. Available [Online] Ul-Abdin, Ahlander, Svensson (bib0026) 2013 Richie, Ross, Park, Shires (bib0005) 2015; 9 Throughput, Intel. Available [Online] Wentzlaff, Griffin, Hoffmann, Bao, Edwards, Ramey, Mattina, Miao, Brown, Agarwal (bib0029) Sep. 2007; 27 Richie (bib0011) 2013 Demaine (bib0037) 1997 (accessed 26.03.15). Sapir (bib0025) 2012 Daya, Chen, Subramanian, Kwon, Park, Krishna, Holt, Chandrakasan, Peh (bib0032) 2014 Varghese, Edwards, Mitra, Rendell (bib0015) 2014 Malvoni, Knezovic (bib0027) 2014 Available [Online] de Dinechin, de Massas, Lager, Léger, Orgogozo, Reybert, Strudel (bib0035) 2013; 18 “Erlang-OTP and the Parallella Board,” March 2015. Available [Online] Sapir (bib0024) 2012 M. Frigo and S. Johnson, “FFT Benchmark Methodology,” Olofsson, Nordström, Ul-Abdin (bib0004) 2014 Ureña, Riepen, Konow (bib0040) 2011 Shen, Tang, Yang (bib0038) 1999 Williams, Waterman, Patterson (bib0013) 2009; 52 Clauss, Lankes, Reble, Bemmerl (bib0010) 2011 Mattson (10.1016/j.micpro.2016.02.006_bib0039) 2010 Williams (10.1016/j.micpro.2016.02.006_bib0013) 2009; 52 Wentzlaff (10.1016/j.micpro.2016.02.006_bib0029) 2007; 27 10.1016/j.micpro.2016.02.006_bib0016 Ureña (10.1016/j.micpro.2016.02.006_bib0040) 2011 10.1016/j.micpro.2016.02.006_bib0014 10.1016/j.micpro.2016.02.006_bib0012 de Dinechin (10.1016/j.micpro.2016.02.006_bib0035) 2013; 18 Demaine (10.1016/j.micpro.2016.02.006_bib0037) 1997 Clauss (10.1016/j.micpro.2016.02.006_bib0010) 2011 10.1016/j.micpro.2016.02.006_bib0019 10.1016/j.micpro.2016.02.006_bib0018 Sapir (10.1016/j.micpro.2016.02.006_bib0025) 2012 Richie (10.1016/j.micpro.2016.02.006_bib0005) 2015; 9 Gropp (10.1016/j.micpro.2016.02.006_bib0008) 2007; 33 Varghese (10.1016/j.micpro.2016.02.006_bib0015) 2014 Protopopov (10.1016/j.micpro.2016.02.006_bib0036) 2001; 61 10.1016/j.micpro.2016.02.006_bib0020 Daya (10.1016/j.micpro.2016.02.006_bib0032) 2014 Ross (10.1016/j.micpro.2016.02.006_bib0006) 2015 Malvoni (10.1016/j.micpro.2016.02.006_bib0027) 2014 10.1016/j.micpro.2016.02.006_bib0003 10.1016/j.micpro.2016.02.006_bib0002 Olofsson (10.1016/j.micpro.2016.02.006_bib0004) 2014 Balaji (10.1016/j.micpro.2016.02.006_bib0009) 2008 Melpignano (10.1016/j.micpro.2016.02.006_bib0033) 2012 10.1016/j.micpro.2016.02.006_bib0001 Shen (10.1016/j.micpro.2016.02.006_bib0038) 1999 10.1016/j.micpro.2016.02.006_bib0022 10.1016/j.micpro.2016.02.006_bib0021 Taylor (10.1016/j.micpro.2016.02.006_bib0030) 2004 10.1016/j.micpro.2016.02.006_bib0007 Richie (10.1016/j.micpro.2016.02.006_bib0011) 2013 Gebrewahid (10.1016/j.micpro.2016.02.006_bib0023) 2014 Vangal (10.1016/j.micpro.2016.02.006_bib0017) 2008; 43 Ul-Abdin (10.1016/j.micpro.2016.02.006_bib0026) 2013 Sapir (10.1016/j.micpro.2016.02.006_bib0024) 2012 Howard (10.1016/j.micpro.2016.02.006_bib0028) 2010 Butts (10.1016/j.micpro.2016.02.006_bib0031) 2007 Marongiu (10.1016/j.micpro.2016.02.006_bib0034) 2013
References_xml	– reference: 2016. (accessed 25.03.15). – year: 2014 ident: bib0027 article-title: Are your passwords safe: energy-efficient Bcrypt cracking with low-cost parallel hardware publication-title: Proceedings of the 8th USENIX conference on Offensive Technologies (WOOT’14) – volume: 27 start-page: 15 year: Sep. 2007 end-page: 31 ident: bib0029 article-title: On-chip interconnection architecture of the tile processor publication-title: IEEE Micro – reference: “Epiphany SDK Reference,” Adapteva Inc. – reference: “Parallella-1.x Reference Manual,” Adapteva, Boston Design Solutions, Ant Micro. Rev. 14.09.09. – reference: . (accessed 23.03.15). – start-page: 330 year: 2013 end-page: 338 ident: bib0026 article-title: Energy-efficient synthetic-aperture radar processing on a manycore architecture publication-title: Proceedings of the 2013 42nd International Conference on Parallel Processing (ICPP’13) – start-page: 41 year: 2015 end-page: 47 ident: bib0006 article-title: Parallel programming model for the Epiphany many-core coprocessor using threaded MPI publication-title: Proceedings of the 3rd International Workshop on Many-core Embedded Systems (MES’15) – year: 2013 ident: bib0011 article-title: COPRTHR API Reference – start-page: 49 year: 1999 ident: bib0038 article-title: Adaptive two-level thread management for fast MPI execution on shared memory machines publication-title: Proceedings of the 1999 ACM/IEEE conference on Supercomputing – start-page: 208 year: 2011 end-page: 217 ident: bib0040 article-title: RCKMPI–lightweight MPI implementation for Intel's Single-chip Cloud Computer (SCC) publication-title: Proceedings of the 18th European MPI Users’ Group Conference on Recent Advances in the Message Passing Interface (EuroMPI’11) – start-page: 525 year: 2011 end-page: 532 ident: bib0010 article-title: Evaluation and improvements of programming models for the Intel SCC many-core processor publication-title: Proceedings of the International Conference on High Performance Computing and Simulation (HPCS) – start-page: 120 year: 2008 end-page: 129 ident: bib0009 article-title: Toward efficient support for multithreaded MPI communication publication-title: Recent Advances in Parallel Virtual Machine and Message Passing Interface – reference: . (accessed 26.06.15). – start-page: 25 year: 2014 end-page: 36 ident: bib0032 article-title: SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering publication-title: Proceedings of the 41st Annual International Symposium on Computer Architecture (ISCA’14) – start-page: 1137 year: 2012 end-page: 1142 ident: bib0033 article-title: Platform 2012, a many-core computing accelerator for embedded SoCs: performance evaluation of visual analytics applications publication-title: Proceedings of the 49th Annual Design Automation Conference (DAC’12) – reference: and DGEMM – reference: “Erlang-OTP and the Parallella Board,” March 2015. Available [Online]: – reference: B. Webber, “The Apl to C compiler aplc is now ported to the Parallella.” Available [Online]: – reference: MPI ‘lite’ proof-of-concept, Parallella Community. Available [online]: – start-page: 2 year: 2004 end-page: 13 ident: bib0030 article-title: Evaluation of the RAW microprocessor: An exposed-wire-delay architecture for ILP and streams publication-title: Proceedings of the 31st Annual International Symposium on Computer Architecture, ISCA’04 – year: 2012 ident: bib0025 article-title: Using a Scalable Parallel 2D FFT for Image Enhancement. White Paper – start-page: 1719 year: 2014 end-page: 1726 ident: bib0004 article-title: “Kickstarting high-performance energy-efficient manycore architectures with Epiphany” publication-title: Asilomar Conference on Signals, Systems and Computers – reference: W. Gropp, T. Hoefler, R. Thakur, and E. Lusk, “N-body program using pipelining algorithm.” Available [Online]: – start-page: 153 year: 1997 end-page: 163 ident: bib0037 article-title: A threads-only MPI implementation for the development of parallel programs publication-title: Proceedings of the 11th International Symposium on High Performance Computing Systems – volume: 52 start-page: 65 year: 2009 end-page: 76 ident: bib0013 article-title: Roofline: an insightful visual performance model for multicore architectures publication-title: Commun. ACM – start-page: 55 year: 2007 end-page: 64 ident: bib0031 article-title: A structural object programming model, architecture, chip and tools for reconfigurable computing publication-title: Proceedings of the Field-Programmable Custom Computing Machines (FCCM) – start-page: 108 year: 2010 end-page: 109 ident: bib0028 article-title: A 48-core IA-32 message-passing processor with DVFS in 45 publication-title: Proceedings of the International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) – start-page: 984 year: 2014 end-page: 992 ident: bib0015 article-title: Programming the Adapteva Epiphany 64-core network-on-chip coprocessor publication-title: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW’14) – reference: . (accessed 29.10.15). – volume: 33 start-page: 595 year: Sep. 2007 end-page: 604 ident: bib0008 article-title: Thread-safety in an MPI implementation: requirements and analysis publication-title: Parallel Comput. – reference: “Threaded MPI examples for the Adapteva Epiphany architecture, GitHub. Available [Online]: – start-page: 1 year: 2013 end-page: 8 ident: bib0034 article-title: Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP publication-title: Proceedings of the First International Workshop on Many-core Embedded Systems (MES’13) – reference: “Epiphany Architecture Reference,” Adapteva, Rev. 14.03.11. – volume: 9 start-page: 94 year: 2015 end-page: 100 ident: bib0005 article-title: Threaded MPI programming model for the Epiphany RISC array processor publication-title: J. Comput. Sci. – volume: 18 start-page: 1654 year: 2013 end-page: 1663 ident: bib0035 article-title: A distributed run-time environment for the Kalray MPPA publication-title: Proc. Comput. Sci. – year: 2012 ident: bib0024 article-title: Scalable Parallel Multiplication of Big Matrices. White Paper – reference: “Matrix–matrix multiply.” Available [Online]: – reference: M. Frigo and S. Johnson, “FFT Benchmark Methodology,” – reference: . Available [Online]: – reference: Throughput, Intel. Available [Online]: – reference: “E16G301 Epiphany 16-core Microprocessor,” Adapteva Inc., Lexington, MA. Datasheet Rev. 14.03.11. – reference: . (accessed 25.03.15). – reference: Xeon PhiTM Coprocessor SGEMM – reference: . (accessed 30.06.15). – volume: 43 start-page: 29 year: 2008 end-page: 41 ident: bib0017 article-title: An 80-tile sub-100-W teraFLOPS processor in 65-nm CMOS publication-title: IEEE J. Solid-State Circuits – start-page: 1 year: 2010 end-page: 11 ident: bib0039 article-title: The 48-core SCC processor: the programmer's view publication-title: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’10) – reference: “Intel – start-page: 321 year: 2014 end-page: 328 ident: bib0023 article-title: Realizing efficient execution of dataflow actors on manycores publication-title: Proceedings of the 12th IEEE International Conference on Embedded and Ubiquitous Computing (EUC’14) – volume: 61 start-page: 449 year: 2001 end-page: 466 ident: bib0036 article-title: A multithreaded message passing interface (MPI) architecture: performance and program issues publication-title: J. Parallel Distrib. Comput. – reference: . (accessed 26.03.15). – start-page: 2 year: 2004 ident: 10.1016/j.micpro.2016.02.006_bib0030 article-title: Evaluation of the RAW microprocessor: An exposed-wire-delay architecture for ILP and streams – volume: 43 start-page: 29 issue: 1 year: 2008 ident: 10.1016/j.micpro.2016.02.006_bib0017 article-title: An 80-tile sub-100-W teraFLOPS processor in 65-nm CMOS publication-title: IEEE J. Solid-State Circuits doi: 10.1109/JSSC.2007.910957 – start-page: 55 year: 2007 ident: 10.1016/j.micpro.2016.02.006_bib0031 article-title: A structural object programming model, architecture, chip and tools for reconfigurable computing – start-page: 321 year: 2014 ident: 10.1016/j.micpro.2016.02.006_bib0023 article-title: Realizing efficient execution of dataflow actors on manycores – volume: 61 start-page: 449 issue: 4 year: 2001 ident: 10.1016/j.micpro.2016.02.006_bib0036 article-title: A multithreaded message passing interface (MPI) architecture: performance and program issues publication-title: J. Parallel Distrib. Comput. doi: 10.1006/jpdc.2000.1674 – volume: 52 start-page: 65 issue: 4 year: 2009 ident: 10.1016/j.micpro.2016.02.006_bib0013 article-title: Roofline: an insightful visual performance model for multicore architectures publication-title: Commun. ACM doi: 10.1145/1498765.1498785 – start-page: 984 year: 2014 ident: 10.1016/j.micpro.2016.02.006_bib0015 article-title: Programming the Adapteva Epiphany 64-core network-on-chip coprocessor – volume: 9 start-page: 94 year: 2015 ident: 10.1016/j.micpro.2016.02.006_bib0005 article-title: Threaded MPI programming model for the Epiphany RISC array processor publication-title: J. Comput. Sci. doi: 10.1016/j.jocs.2015.04.023 – ident: 10.1016/j.micpro.2016.02.006_bib0016 – ident: 10.1016/j.micpro.2016.02.006_bib0018 – start-page: 330 year: 2013 ident: 10.1016/j.micpro.2016.02.006_bib0026 article-title: Energy-efficient synthetic-aperture radar processing on a manycore architecture – start-page: 1137 year: 2012 ident: 10.1016/j.micpro.2016.02.006_bib0033 article-title: Platform 2012, a many-core computing accelerator for embedded SoCs: performance evaluation of visual analytics applications – start-page: 1 year: 2013 ident: 10.1016/j.micpro.2016.02.006_bib0034 article-title: Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP – volume: 33 start-page: 595 issue: 9 year: 2007 ident: 10.1016/j.micpro.2016.02.006_bib0008 article-title: Thread-safety in an MPI implementation: requirements and analysis publication-title: Parallel Comput. doi: 10.1016/j.parco.2007.07.002 – ident: 10.1016/j.micpro.2016.02.006_bib0002 – ident: 10.1016/j.micpro.2016.02.006_bib0021 – start-page: 1 year: 2010 ident: 10.1016/j.micpro.2016.02.006_bib0039 article-title: The 48-core SCC processor: the programmer's view – year: 2013 ident: 10.1016/j.micpro.2016.02.006_bib0011 – start-page: 525 year: 2011 ident: 10.1016/j.micpro.2016.02.006_bib0010 article-title: Evaluation and improvements of programming models for the Intel SCC many-core processor – start-page: 120 year: 2008 ident: 10.1016/j.micpro.2016.02.006_bib0009 article-title: Toward efficient support for multithreaded MPI communication – volume: 18 start-page: 1654 year: 2013 ident: 10.1016/j.micpro.2016.02.006_bib0035 article-title: A distributed run-time environment for the Kalray MPPA®-256 integrated manycore processor publication-title: Proc. Comput. Sci. doi: 10.1016/j.procs.2013.05.333 – year: 2012 ident: 10.1016/j.micpro.2016.02.006_bib0024 – volume: 27 start-page: 15 issue: 5 year: 2007 ident: 10.1016/j.micpro.2016.02.006_bib0029 article-title: On-chip interconnection architecture of the tile processor publication-title: IEEE Micro doi: 10.1109/MM.2007.4378780 – ident: 10.1016/j.micpro.2016.02.006_bib0014 – start-page: 153 year: 1997 ident: 10.1016/j.micpro.2016.02.006_bib0037 article-title: A threads-only MPI implementation for the development of parallel programs – ident: 10.1016/j.micpro.2016.02.006_bib0012 – year: 2014 ident: 10.1016/j.micpro.2016.02.006_bib0027 article-title: Are your passwords safe: energy-efficient Bcrypt cracking with low-cost parallel hardware – start-page: 25 year: 2014 ident: 10.1016/j.micpro.2016.02.006_bib0032 article-title: SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering – start-page: 1719 year: 2014 ident: 10.1016/j.micpro.2016.02.006_bib0004 article-title: “Kickstarting high-performance energy-efficient manycore architectures with Epiphany” – ident: 10.1016/j.micpro.2016.02.006_bib0019 – start-page: 49 year: 1999 ident: 10.1016/j.micpro.2016.02.006_bib0038 article-title: Adaptive two-level thread management for fast MPI execution on shared memory machines – start-page: 208 year: 2011 ident: 10.1016/j.micpro.2016.02.006_bib0040 article-title: RCKMPI–lightweight MPI implementation for Intel's Single-chip Cloud Computer (SCC) – year: 2012 ident: 10.1016/j.micpro.2016.02.006_bib0025 – start-page: 108 year: 2010 ident: 10.1016/j.micpro.2016.02.006_bib0028 article-title: A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS – start-page: 41 year: 2015 ident: 10.1016/j.micpro.2016.02.006_bib0006 article-title: Parallel programming model for the Epiphany many-core coprocessor using threaded MPI – ident: 10.1016/j.micpro.2016.02.006_bib0007 – ident: 10.1016/j.micpro.2016.02.006_bib0001 – ident: 10.1016/j.micpro.2016.02.006_bib0022 – ident: 10.1016/j.micpro.2016.02.006_bib0003 – ident: 10.1016/j.micpro.2016.02.006_bib0020
SSID	ssj0005062
Score	2.0792997
Snippet	•We investigate the use of MPI for programming the Epiphany RISC array processor.•A threaded MPI implementation adapted for coprocessor offload is... The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It...
SourceID	proquest crossref elsevier
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	95
SubjectTerms	2D RISC array Adapteva Epiphany Algorithms Architecture Computing time Energy efficiency Many-core Message passing MPI NoC Parallel programming RISC Threaded Two dimensional
Title	Parallel programming model for the Epiphany many-core coprocessor using threaded MPI
URI	https://dx.doi.org/10.1016/j.micpro.2016.02.006 https://www.proquest.com/docview/1816023380
Volume	43
WOSCitedRecordID	wos000377740500009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-9436 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005062 issn: 0141-9331 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3da9swEBch3cNe9j3Wjw0N9hYcZMu2rMdQMtaylrBmkDcj2RZNSRwTJ6X_w_7pnixLMf1Y14e9iCDLwvL9ojuffneH0DcQal5QIT0CrQf6mnhSSu2Gk3nAiAoykzL_Jzs_T2YzPun1_thYmOsFK8vk5oZX_1XU0AfC1qGzzxC3mxQ64DcIHVoQO7T_JPiJWOv6KAtLvVpqZ0BT8MZRCsfVvLrUB_1LaDydyHKQrSoTMgBDtrWJoVprmnM-OJucdE3YM03hc4NNiuel7qs7yc9Nzcba8XAHo6Hrn2eX5lCk4dN3rkxa3vaFrn906rovtP_cWPugzQa_hl1PhR_vGFXOeel73EZotbtvSDvbp6m32Spiv0l-cH-PN-6GqyEsDRar2XmxSbv6QErtO6rOERAtt-0qNbOkepaUBGmTvn0vYBFP-mhvdDKene4YQ6SpT-uWYSMxG7rg_ad5zNK5o_MbQ2b6Br1qv0DwyCDnLeoV5Tv02lb3wO1m_x5NLZBwB0i4ARIGIGEAErZAwg5IuAMk3AAJWyBhANIH9Pv7eHr8w2trcHgZpXzj5ZwKRsJCyShSEnb3gIF2pwo-c2O_kEpIRgoZJ9QPBSG5FLFKioTp8lcqErmiH1G_XJXFJ4T9TPEwE0rxKA95zkUYc6Zgp8glqI0s2EfUvq80axPU6zopi_Rv0tpHnrurMglanhjPrCjS1sg0xmMK-Hrizq9WcnC11gdroixW2zoFKzkG25cm5OCZT3OIXu7-KUeov1lvi8_oRXa9mdfrLy0AbwEObK5b
linkProvider	Elsevier
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parallel+programming+model+for+the+Epiphany+many-core+coprocessor+using+threaded+MPI&rft.jtitle=Microprocessors+and+microsystems&rft.au=Ross%2C+James+A.&rft.au=Richie%2C+David+A.&rft.au=Park%2C+Song+J.&rft.au=Shires%2C+Dale+R.&rft.date=2016-06-01&rft.issn=0141-9331&rft.volume=43&rft.spage=95&rft.epage=103&rft_id=info:doi/10.1016%2Fj.micpro.2016.02.006&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_micpro_2016_02_006
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0141-9331&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0141-9331&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0141-9331&client=summon