Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communication

The increasing trend of manycore processors makes multithreaded communication more important to avoid costly global synchronization among cores. One of the representative approaches that require multithreaded communication is the global task-based programming model. In the model, a program is divide...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:The Journal of supercomputing Ročník 80; číslo 14; s. 20715 - 20742
Hlavní autori: Watanabe, Yutaka, Tsuji, Miwako, Murai, Hitoshi, Boku, Taisuke, Sato, Mitsuhisa
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York Springer US 01.09.2024
Springer Nature B.V
Predmet:
ISSN:0920-8542, 1573-0484
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract The increasing trend of manycore processors makes multithreaded communication more important to avoid costly global synchronization among cores. One of the representative approaches that require multithreaded communication is the global task-based programming model. In the model, a program is divided into tasks, and tasks are asynchronously executed by each node, and independent thread-to-thread communications are expected. However, the Message passing interface (MPI) based approach is not efficient because of design issues. In this research, we design and implement the utofu transport layer in an abstracted communication library called Unified communication-X (UCX) for efficient remote direct memory access (RDMA) based multithreaded communication on Tofu Interconnect D. The evaluation results on Fugaku show that UCX can significantly improve the multithreaded performance over MPI, while maintaining portability between systems thanks to UCX. UCX shows about 32.8 times lower latency than Fujitsu MPI with 24 threads in the multithreaded pingpong benchmark and about 37.8 times higher update rate than Fujitsu MPI with 24 threads on 256 nodes in multithreaded GUPs benchmark.
AbstractList The increasing trend of manycore processors makes multithreaded communication more important to avoid costly global synchronization among cores. One of the representative approaches that require multithreaded communication is the global task-based programming model. In the model, a program is divided into tasks, and tasks are asynchronously executed by each node, and independent thread-to-thread communications are expected. However, the Message passing interface (MPI) based approach is not efficient because of design issues. In this research, we design and implement the utofu transport layer in an abstracted communication library called Unified communication-X (UCX) for efficient remote direct memory access (RDMA) based multithreaded communication on Tofu Interconnect D. The evaluation results on Fugaku show that UCX can significantly improve the multithreaded performance over MPI, while maintaining portability between systems thanks to UCX. UCX shows about 32.8 times lower latency than Fujitsu MPI with 24 threads in the multithreaded pingpong benchmark and about 37.8 times higher update rate than Fujitsu MPI with 24 threads on 256 nodes in multithreaded GUPs benchmark.
Author Boku, Taisuke
Tsuji, Miwako
Watanabe, Yutaka
Murai, Hitoshi
Sato, Mitsuhisa
Author_xml – sequence: 1
  givenname: Yutaka
  surname: Watanabe
  fullname: Watanabe, Yutaka
  email: ywatanabe@hpcs.cs.tsukuba.ac.jp
  organization: Graduate School of Science and Technology, University of Tsukuba
– sequence: 2
  givenname: Miwako
  surname: Tsuji
  fullname: Tsuji, Miwako
  organization: RIKEN Center for Computational Science
– sequence: 3
  givenname: Hitoshi
  surname: Murai
  fullname: Murai, Hitoshi
  organization: RIKEN Center for Computational Science
– sequence: 4
  givenname: Taisuke
  surname: Boku
  fullname: Boku, Taisuke
  organization: Graduate School of Science and Technology, University of Tsukuba, Center for Computational Sciences, University of Tsukuba
– sequence: 5
  givenname: Mitsuhisa
  surname: Sato
  fullname: Sato, Mitsuhisa
  organization: RIKEN Center for Computational Science
BookMark eNp9kEtLAzEUhYMoWKt_wFXAdfTm0XkspT6h4KYFdyFmbtrRTlKTjFbwxztawZ2ruzjnOxe-I7Lvg0dCTjmcc4DyInEuRMlAKAaFAM62e2TEJ6VkoCq1T0ZQC2DVRIlDcpTSMwAoWcoR-bzC1C49Nb6hG4wuxM54ixTfzLo3uQ2eBkcX00c6RDSvkM6D6-m9zxht8B5tpld0aN30S_PS0xzeTWwSReda26LPtOvXuc2riKbBhtrQdb1v7c_0MTlwZp3w5PeOyeLmej69Y7OH2_vp5YxZyVVmJShbgnQARhUOqsYaJ6VEUByapqxV7UAgtyWvBRayrhounipjlXW2AA5yTM52u5sYXntMWT-HPvrhpZZiUkxqqQYdYyJ2LRtDShGd3sS2M_FDc9DflvXOsh4s6x_LejtAcgeloeyXGP-m_6G-AKOtg0E
Cites_doi 10.1109/SC.2014.45
10.1109/CCGRID.2017.149
10.1145/3236367.3236382
10.1109/CLUSTER.2018.00090
10.1007/978-3-319-50995-2_8
10.1109/SC41405.2020.00051
10.1109/SC41404.2022.00082
10.1007/978-3-030-04918-8_13
10.1016/j.jnca.2008.07.008
10.1109/CLUSTER.2019.8891015
10.1109/IPDPS.2003.1213363
10.1109/PAW-ATM56565.2022.00010
10.1177/1094342009360206
10.1145/3392717.3392773
10.1007/978-3-319-73814-7_4
10.1016/j.parco.2018.12.008
10.1109/HOTI.2015.13
ContentType Journal Article
Copyright The Author(s) 2024. corrected publication 2024
The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: The Author(s) 2024. corrected publication 2024
– notice: The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID C6C
AAYXX
CITATION
8FE
8FG
ABJCF
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
L6V
M7S
P5Z
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
DOI 10.1007/s11227-024-06201-x
DatabaseName Springer Nature OA Free Journals
CrossRef
ProQuest SciTech Collection
ProQuest Technology Collection
SciTech Premium Collection
ProQuest Central UK/Ireland
Health Research Premium Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
ProQuest Engineering Collection
Engineering Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle CrossRef
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest Central Korea
ProQuest Central (New)
Engineering Collection
Advanced Technologies & Aerospace Collection
Engineering Database
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList
CrossRef
Computer Science Database
Database_xml – sequence: 1
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-0484
EndPage 20742
ExternalDocumentID 10_1007_s11227_024_06201_x
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
.4S
.86
.DC
.VR
06D
0R~
0VY
123
199
1N0
1SB
2.D
203
28-
29L
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
78A
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYOK
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDBF
ABDPE
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACUHS
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADMLS
ADQRH
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHSBF
AHYZX
AI.
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
B0M
BA0
BBWZM
BDATZ
BGNMA
BSONS
C6C
CAG
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EAD
EAP
EAS
EBD
EBLON
EBS
EDO
EIOEI
EJD
EMK
EPL
ESBYG
ESX
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
H~9
I-F
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAK
LLZTM
M4Y
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P9O
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RNI
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VH1
W23
W48
WH7
WK8
YLTOR
Z45
Z7R
Z7X
Z7Z
Z83
Z88
Z8M
Z8N
Z8R
Z8T
Z8W
Z92
ZMTXR
~8M
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABJCF
ABRTQ
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFFHD
AFHIU
AFKRA
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ARAPS
ATHPR
AYFIA
BENPR
BGLVJ
CCPQU
CITATION
HCIFZ
K7-
M7S
PHGZM
PHGZT
PQGLB
PTHSS
8FE
8FG
AZQEC
DWQXO
GNUQQ
JQ2
L6V
P62
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-c314t-704c703f00a46f08dcaf333e0410dd7949f02e1c7192e6398d12b8ac4cfc60103
IEDL.DBID RSV
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001237797200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0920-8542
IngestDate Sun Nov 30 05:05:01 EST 2025
Sat Nov 29 04:27:47 EST 2025
Fri Feb 21 02:38:43 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 14
Keywords Tofu Interconnect D
UCX
A64FX
One-sided communication
Supercomputer Fugaku
Multithreaded communication
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c314t-704c703f00a46f08dcaf333e0410dd7949f02e1c7192e6398d12b8ac4cfc60103
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://link.springer.com/10.1007/s11227-024-06201-x
PQID 3256593404
PQPubID 2043774
PageCount 28
ParticipantIDs proquest_journals_3256593404
crossref_primary_10_1007_s11227_024_06201_x
springer_journals_10_1007_s11227_024_06201_x
PublicationCentury 2000
PublicationDate 20240900
2024-09-00
20240901
PublicationDateYYYYMMDD 2024-09-01
PublicationDate_xml – month: 9
  year: 2024
  text: 20240900
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationSubtitle An International Journal of High-Performance Computer Design, Analysis, and Use
PublicationTitle The Journal of supercomputing
PublicationTitleAbbrev J Supercomput
PublicationYear 2024
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References SalaKTeruelXPerezJMPeñaAJBeltranVLabartaJIntegrating blocking and non-blocking MPI primitives with task-based programming modelsParallel Comput20198515316610.1016/j.parco.2018.12.008
Zambre R, Chandramowliswharan A, Balaji P (2020) How i learned to stop worrying about user-visible endpoints and love MPI. In: Proceedings of the 34th ACM International Conference on Supercomputing. ICS ’20. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3392717.3392773
Oak Ridge National Laboratory: Frontier. https://www.olcf.ornl.gov/frontier. [Online; accessed 23-July 2023]
Watanabe Y, Sato M, Tsuji M, Murai H, Boku T (2022) Design and performance evaluation of UCX for Tofu-D interconnect with OpenSHMEM-UCX on fugaku. In: 2022 IEEE/ACM Parallel Applications Workshop: Alternatives to MPI+ X (PAW-ATM), pp 52–61 . IEEE
Bell C, Bonachea D (2003) A new DMA registration strategy for pinning-based high performance networks. In: Proceedings International Parallel and Distributed Processing Symposium, p 10. IEEE
Klib–a generic library in C. https://attractivechaos.github.io/klib/. [Online; accessed 23-Mar 2024]
Fujitsu global: FUJITSU Processor A64FX. https://www.fujitsu.com/global/products/computing/servers/supercomputer/a64fx/. [Online; accessed 23-July 2023]
Fujitsu: Technical Computing Suite V4.0L20 Development Studio uTofu User’s Guide. https://software.fujitsu.com/jp/manual/manualfiles/m210007/j2ul2482/02enz003/j2ul-2482-02enz0.pdf. [Online; accessed 23-July 2023]
Sato M, Ishikawa Y, Tomita H, Kodama Y, Odajima T, Tsuji M, Yashiro H, Aoki M, Shida N, Miyoshi I, et al. (2020) Co-design for a64fx manycore processor and “fugaku”. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–15. IEEE
The Unified Communication X Library. http://www.openucx.org. [Online; accessed 23-July 2023]
Zambre R, Chandramowlishwaran A (2022) Lessons learned on MPI+ threads communication. In: SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–16. IEEE
Naughton T, Aderholdt F, Baker M, Pophale S, Gorentla Venkata M, Imam N (2019) Oak ridge OpenSHMEM benchmark suite. In: OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity: 5th Workshop, OpenSHMEM 2018, Baltimore, MD, USA, August 21–23, 2018, Revised Selected Papers 5, pp 202–216. Springer
RIKEN Center for Computational Science: About Fugaku. https://www.r-ccs.riken.jp/en/fugaku/about/. [Online; accessed 23-July 2023]
Patinyasakdikul T, Eberius D, Bosilca G, Hjelm N (2019) Give MPI threading a fair chance: a study of multithreaded MPI designs. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–11. IEEE
Baker M, Aderholdt F, Venkata MG, Shamis P (2016) OpenSHMEM-UCX: evaluation of UCX for implementing openshmem programming model. In: OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments: Third Workshop, OpenSHMEM 2016, Baltimore, MD, USA, August 2–4, 2016, Revised Selected Papers 3, pp 114–130 . Springer
Sridharan S, Dinan J, Kalamkar DD (2014) Enabling efficient multithreaded MPI communication through a library-based implementation of MPI endpoints. In: SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 487–498. IEEE
RIKEN center for computational science: an overview of RIKEN MPI (MPICH-Tofu). https://www.r-ccs.riken.jp/wp/wp-content/uploads/2021/01/MPICH-Tofu.pdf. [Online; accessed 23-July 2023]
Fujitsu global: FUJITSU Supercomputer PRIMEHPC. https://openucx.org/introduction/. [Online; accessed 23-July 2023]
Papadopoulou N, Oden L, Balaji P (2017) A performance study of UCX over InfiniBand. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp 345–354. IEEE
GitHub openucx/ucx. https://github.com/openucx/ucx. [Online; accessed 23-July 2023]
OuLHeXHanJAn efficient design for fast memory registration in RDMAJ Netw Comput Appl200932364265110.1016/j.jnca.2008.07.008
GitHub ornl-languages/osb. https://github.com/ornl-languages/osb. [Online; accessed 23-July 2023]
Shamis P, Venkata MG, Lopez MG, Baker MB, Hernandez O, Itigin Y, Dubman M, Shainer G, Graham RL, Liss L, et al (2015) Ucx: an open source framework for HPC network APIS and beyond. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp 40–43, IEEE
BalajiPBuntinasDGoodellDGroppWThakurRFine-grained multithreading support for hybrid threaded MPI programmingInt J High Perform Comput Appl2010241495710.1177/1094342009360206
Sala K, Bellón J, Farré P, Teruel X, Perez JM, Peña AJ, Holmes D, Beltran V, Labarta J (2018) Improving the interoperability between mpi and task-based programming models. In: Proceedings of the 25th European MPI Users’ Group Meeting. EuroMPI ’18. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3236367.3236382
Ajima Y, Kawashima T, Okamoto T, Shida N, Hirai K, Shimizu T, Hiramoto S, Ikeda Y, Yoshikawa T, Uchida K, et al. (2018) The Tofu interconnect D. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp 646–654. IEEE
Bouteiller A, Pophale S, Boehm S, Baker MB, Venkata MG (2018) Evaluating contexts in OpenSHMEM-X reference implementation. In: OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence: 4th Workshop, OpenSHMEM 2017, Annapolis, MD, USA, August 7-9, 2017, Revised Selected Papers 4, pp 50–62. Springer
Baker M, Aderholdt F, Venkata MG, Shamis P (2016) OpenSHMEM-UCX: evaluation of UCX for implementing OpenSHMEM programming model. In: OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments: Third Workshop, OpenSHMEM 2016, Baltimore, MD, USA, August 2–4, 2016, Revised Selected Papers 3, pp 114–130. Springer
K Sala (6201_CR6) 2019; 85
6201_CR19
6201_CR18
L Ou (6201_CR24) 2009; 32
6201_CR9
P Balaji (6201_CR16) 2010; 24
6201_CR7
6201_CR15
6201_CR8
6201_CR14
6201_CR5
6201_CR17
6201_CR3
6201_CR11
6201_CR4
6201_CR10
6201_CR1
6201_CR13
6201_CR2
6201_CR12
6201_CR20
6201_CR26
6201_CR25
6201_CR28
6201_CR27
6201_CR22
6201_CR21
6201_CR23
References_xml – reference: Oak Ridge National Laboratory: Frontier. https://www.olcf.ornl.gov/frontier. [Online; accessed 23-July 2023]
– reference: Zambre R, Chandramowlishwaran A (2022) Lessons learned on MPI+ threads communication. In: SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–16. IEEE
– reference: Klib–a generic library in C. https://attractivechaos.github.io/klib/. [Online; accessed 23-Mar 2024]
– reference: RIKEN Center for Computational Science: About Fugaku. https://www.r-ccs.riken.jp/en/fugaku/about/. [Online; accessed 23-July 2023]
– reference: Zambre R, Chandramowliswharan A, Balaji P (2020) How i learned to stop worrying about user-visible endpoints and love MPI. In: Proceedings of the 34th ACM International Conference on Supercomputing. ICS ’20. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3392717.3392773
– reference: RIKEN center for computational science: an overview of RIKEN MPI (MPICH-Tofu). https://www.r-ccs.riken.jp/wp/wp-content/uploads/2021/01/MPICH-Tofu.pdf. [Online; accessed 23-July 2023]
– reference: GitHub openucx/ucx. https://github.com/openucx/ucx. [Online; accessed 23-July 2023]
– reference: BalajiPBuntinasDGoodellDGroppWThakurRFine-grained multithreading support for hybrid threaded MPI programmingInt J High Perform Comput Appl2010241495710.1177/1094342009360206
– reference: The Unified Communication X Library. http://www.openucx.org. [Online; accessed 23-July 2023]
– reference: Papadopoulou N, Oden L, Balaji P (2017) A performance study of UCX over InfiniBand. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp 345–354. IEEE
– reference: Patinyasakdikul T, Eberius D, Bosilca G, Hjelm N (2019) Give MPI threading a fair chance: a study of multithreaded MPI designs. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–11. IEEE
– reference: Baker M, Aderholdt F, Venkata MG, Shamis P (2016) OpenSHMEM-UCX: evaluation of UCX for implementing OpenSHMEM programming model. In: OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments: Third Workshop, OpenSHMEM 2016, Baltimore, MD, USA, August 2–4, 2016, Revised Selected Papers 3, pp 114–130. Springer
– reference: SalaKTeruelXPerezJMPeñaAJBeltranVLabartaJIntegrating blocking and non-blocking MPI primitives with task-based programming modelsParallel Comput20198515316610.1016/j.parco.2018.12.008
– reference: Sala K, Bellón J, Farré P, Teruel X, Perez JM, Peña AJ, Holmes D, Beltran V, Labarta J (2018) Improving the interoperability between mpi and task-based programming models. In: Proceedings of the 25th European MPI Users’ Group Meeting. EuroMPI ’18. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3236367.3236382
– reference: Sridharan S, Dinan J, Kalamkar DD (2014) Enabling efficient multithreaded MPI communication through a library-based implementation of MPI endpoints. In: SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 487–498. IEEE
– reference: Bell C, Bonachea D (2003) A new DMA registration strategy for pinning-based high performance networks. In: Proceedings International Parallel and Distributed Processing Symposium, p 10. IEEE
– reference: Fujitsu global: FUJITSU Processor A64FX. https://www.fujitsu.com/global/products/computing/servers/supercomputer/a64fx/. [Online; accessed 23-July 2023]
– reference: Baker M, Aderholdt F, Venkata MG, Shamis P (2016) OpenSHMEM-UCX: evaluation of UCX for implementing openshmem programming model. In: OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments: Third Workshop, OpenSHMEM 2016, Baltimore, MD, USA, August 2–4, 2016, Revised Selected Papers 3, pp 114–130 . Springer
– reference: GitHub ornl-languages/osb. https://github.com/ornl-languages/osb. [Online; accessed 23-July 2023]
– reference: Shamis P, Venkata MG, Lopez MG, Baker MB, Hernandez O, Itigin Y, Dubman M, Shainer G, Graham RL, Liss L, et al (2015) Ucx: an open source framework for HPC network APIS and beyond. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp 40–43, IEEE
– reference: Naughton T, Aderholdt F, Baker M, Pophale S, Gorentla Venkata M, Imam N (2019) Oak ridge OpenSHMEM benchmark suite. In: OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity: 5th Workshop, OpenSHMEM 2018, Baltimore, MD, USA, August 21–23, 2018, Revised Selected Papers 5, pp 202–216. Springer
– reference: Sato M, Ishikawa Y, Tomita H, Kodama Y, Odajima T, Tsuji M, Yashiro H, Aoki M, Shida N, Miyoshi I, et al. (2020) Co-design for a64fx manycore processor and “fugaku”. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–15. IEEE
– reference: Fujitsu: Technical Computing Suite V4.0L20 Development Studio uTofu User’s Guide. https://software.fujitsu.com/jp/manual/manualfiles/m210007/j2ul2482/02enz003/j2ul-2482-02enz0.pdf. [Online; accessed 23-July 2023]
– reference: Watanabe Y, Sato M, Tsuji M, Murai H, Boku T (2022) Design and performance evaluation of UCX for Tofu-D interconnect with OpenSHMEM-UCX on fugaku. In: 2022 IEEE/ACM Parallel Applications Workshop: Alternatives to MPI+ X (PAW-ATM), pp 52–61 . IEEE
– reference: Bouteiller A, Pophale S, Boehm S, Baker MB, Venkata MG (2018) Evaluating contexts in OpenSHMEM-X reference implementation. In: OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence: 4th Workshop, OpenSHMEM 2017, Annapolis, MD, USA, August 7-9, 2017, Revised Selected Papers 4, pp 50–62. Springer
– reference: Ajima Y, Kawashima T, Okamoto T, Shida N, Hirai K, Shimizu T, Hiramoto S, Ikeda Y, Yoshikawa T, Uchida K, et al. (2018) The Tofu interconnect D. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp 646–654. IEEE
– reference: Fujitsu global: FUJITSU Supercomputer PRIMEHPC. https://openucx.org/introduction/. [Online; accessed 23-July 2023]
– reference: OuLHeXHanJAn efficient design for fast memory registration in RDMAJ Netw Comput Appl200932364265110.1016/j.jnca.2008.07.008
– ident: 6201_CR17
  doi: 10.1109/SC.2014.45
– ident: 6201_CR4
– ident: 6201_CR15
  doi: 10.1109/CCGRID.2017.149
– ident: 6201_CR5
  doi: 10.1145/3236367.3236382
– ident: 6201_CR11
  doi: 10.1109/CLUSTER.2018.00090
– ident: 6201_CR14
  doi: 10.1007/978-3-319-50995-2_8
– ident: 6201_CR13
– ident: 6201_CR28
  doi: 10.1007/978-3-319-50995-2_8
– ident: 6201_CR8
– ident: 6201_CR2
  doi: 10.1109/SC41405.2020.00051
– ident: 6201_CR19
  doi: 10.1109/SC41404.2022.00082
– ident: 6201_CR26
  doi: 10.1007/978-3-030-04918-8_13
– volume: 32
  start-page: 642
  issue: 3
  year: 2009
  ident: 6201_CR24
  publication-title: J Netw Comput Appl
  doi: 10.1016/j.jnca.2008.07.008
– ident: 6201_CR18
  doi: 10.1109/CLUSTER.2019.8891015
– ident: 6201_CR25
– ident: 6201_CR27
– ident: 6201_CR1
– ident: 6201_CR23
  doi: 10.1109/IPDPS.2003.1213363
– ident: 6201_CR9
  doi: 10.1109/PAW-ATM56565.2022.00010
– ident: 6201_CR3
– volume: 24
  start-page: 49
  issue: 1
  year: 2010
  ident: 6201_CR16
  publication-title: Int J High Perform Comput Appl
  doi: 10.1177/1094342009360206
– ident: 6201_CR21
  doi: 10.1145/3392717.3392773
– ident: 6201_CR20
  doi: 10.1007/978-3-319-73814-7_4
– ident: 6201_CR10
– ident: 6201_CR12
– volume: 85
  start-page: 153
  year: 2019
  ident: 6201_CR6
  publication-title: Parallel Comput
  doi: 10.1016/j.parco.2018.12.008
– ident: 6201_CR7
  doi: 10.1109/HOTI.2015.13
– ident: 6201_CR22
SSID ssj0004373
Score 2.358652
Snippet The increasing trend of manycore processors makes multithreaded communication more important to avoid costly global synchronization among cores. One of the...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Index Database
Publisher
StartPage 20715
SubjectTerms Bandwidths
Benchmarks
Communication
Compilers
Computer Science
Design
High performance computing
Interpreters
Laboratories
Libraries
Message passing
Network interface cards
Network topologies
Performance evaluation
Processor Architectures
Programming Languages
Soy products
Supercomputers
Synchronism
Tofu
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3LSgMxFA2-Fm58i9Uqd-FOgzNJ2plZibYWV6VIC90NaR4iQlvbqXThx3uTZhgVdOM6Qxg4JzknyX0QcjmyLFHcSIrWVlCRWO4auXOqRqlsZhJpFUnfbCLpdtPhMOuFC7d5CKss90S_UeuJcnfkNxy1uZFxEYnb6Rt1XaPc62poobFONl2lMuT55v1Dt_dUZUby1RtzhoektCFYSJtZJc_FjCUUNYpGTZRBuvwuTZXf_PFE6pWns_vff94jO8Fzwt2KJPtkzYwPyG7ZzwHC8j4kH20fzgFyrGFaJRRAVREcJhYGrSHgEKBzhP7ELsBfKioXMKMKaAN-1Vk8y9cFFD4kdw7Gl6lAdQMfvlggfaQ2GtTX5JQjMug89FuPNHRnoIrHoqBJJBRuFxbBFE0bpVpJyzk3kYgjrXGZZzZiJlYJekiDPijVMUMCKKGscqdAfkw2xpOxOSHAhEzQ-RmmR5mwaTbKdKxiVEpjWGpSUyNXJTD5dFWEI6_KLTsYc4Qx9zDmyxqpl2jkYUHO8wqKGrku8ayGf5_t9O_Zzsg28xRyUWd1slHMFuacbKn34mU-uwh0_ARpFOea
  priority: 102
  providerName: ProQuest
Title Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communication
URI https://link.springer.com/article/10.1007/s11227-024-06201-x
https://www.proquest.com/docview/3256593404
Volume 80
WOSCitedRecordID wos001237797200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVPQU
  databaseName: Advanced Technologies & Aerospace Database
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 20241214
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: P5Z
  dateStart: 20230101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 20241214
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: K7-
  dateStart: 20230101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Engineering Database
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 20241214
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: M7S
  dateStart: 20230101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 20241214
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: BENPR
  dateStart: 20230101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEB58HbxYn1itZQ7eNLCbpN3do7YWQSjFF8XLkuYhIrTFbsWDP97ZdJdV0YNe9pIQQmYm883mmxmA45HjkRZWMYK2ksnIibyRu2B6FKt2okitAuWbTUT9fjwcJoMiKWxWst3LJ0l_U1fJbiHnESOfwoI2uS1GyHG1lVebyWP0m_sqG1Is3pUTCoziluRFqszPa3x1RxXG_PYs6r1Nr_a_fW7CRoEu8WyhDluwZMfbUCs7N2BhyDvw3vXEDVRjg9MqdQCr2t84cXjXGSINIWFEvJ24OfrfhzqnxugMu0izevNH9TzHzJNvZ2h9QQryY-iJihkpijLWoP6chrILd72L284lK_owMC1CmbEokJouBkdik20XxEYrJ4SwgQwDY8igExdwG-qI0KIlxBObkJOotdRO5_Ge2IOV8WRs9wG5VBFhPMvNKJEuTkaJCXVIPtFaHtvY1uGkFEc6XZTbSKvCyvnBpnSwqT_Y9K0OjVJiaWF6s1QQiGslQgayDqelhKrh31c7-Nv0Q1jnXsg536wBK9nL3B7Bmn7NnmYvTVg9v-gPrpuwfBWxZs4ovaHvoPXQ9Mr6AUXt4VU
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3PTxQxFH4BJMELCEJcRXkHOUHDTFt2OgdjDOsGsrjxsCR7G7v9QYzJ7srOKiT-Tf6NvnZmMkqiNw6eO2ma9nvv-zp9PwBeTzzPjHCakbSVTGZehEbugpmJ0t1cE6wSHZtNZMOhGo_zjyvws8mFCWGVjU-MjtrOTPhHfiKIm09zIRP5dv6Vha5R4XW1aaFRwWLg7r7TlW3x5qJH53vIef_96Oyc1V0FmBGpLFmWSEMw97QI2fWJskZ7IYRLZJpYS_DMfcJdajLSPo74W9mU08KNNN6E24ugeVfhkRSqGyxqkLE2D1NUL9o5XcnUqeR1kk6VqpdynjFiRJZ0iXTZ7Z9E2Krbew-ykef6W__bDj2BzVpR47vKBLZhxU13YKvpVoG183oKP3oxWAX11OK8TZfAtt45zjxenY2RhpB0MY5mfonxl6kJ4UCmxB7SV_3ltf6yxDIGHC_QxSIcxN0YgzNLMg5tnUXze-rNLlw9yB7swdp0NnXPALnUGelax-0kl17lk9ymJiUd4BxXTrkOHDVAKOZViZGiLSYdYFMQbIoIm-K2A_vN6Re1u1kU7dF34LjBTzv899me_3u2A9g4H324LC4vhoMX8JhH-Ib4un1YK2-W7iWsm2_l58XNq2gICJ8eGle_AObpQUI
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1NSwMxEA2iIl6sn1itOgdvGrqbpN3do7QWRSlFW-ltSfMhImxLuxUP_nhn0122ih7Ec0IImZmdl817M4ScjywLFDeSIrQVVASWZ43cOVWjUDYjiW7lSddsIuh2w-Ew6i2p-B3bvXiSXGgasipNSVqfaFsvhW8-YwHF_EK9JqYwiihyTeBNJiN1PTw-lcpIvnhjjvCSFDYEy2UzP6_xNTWVePPbE6nLPJ3K__e8TbZy1AlXCzfZISsm2SWVoqMD5AG-Rz7ajtABMtEwKSUFUNYEh7GFQWsIOASIHaE_tnNwvxVVRplRKbQBZ3Xmz_J1Dqkj5c7AuEIVuFdwBMYUHUhqo0Ety1P2yaBz3W_d0Lw_A1XcFykNPKHwg2HRnKJpvVAraTnnxkNLaI2BHlmPGV8FiCINIqFQ-wxdQAllVXYP5AdkNRkn5pAAEzJA7GeYHkXChtEo0r7yMVcaw0ITmiq5KEwTTxZlOOKy4HJ2sDEebOwONn6vklphvTgPyVnMEdw1Ii48USWXhbXK4d9XO_rb9DOy0Wt34vvb7t0x2WTO3hklrUZW0-ncnJB19Za-zKanzlM_ASJU6Ls
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Design+and+performance+evaluation+of+UCX+for+the+Tofu+Interconnect+D+on+Fugaku+towards+efficient+multithreaded+communication&rft.jtitle=The+Journal+of+supercomputing&rft.au=Watanabe%2C+Yutaka&rft.au=Tsuji%2C+Miwako&rft.au=Murai%2C+Hitoshi&rft.au=Boku%2C+Taisuke&rft.date=2024-09-01&rft.pub=Springer+US&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=80&rft.issue=14&rft.spage=20715&rft.epage=20742&rft_id=info:doi/10.1007%2Fs11227-024-06201-x&rft.externalDocID=10_1007_s11227_024_06201_x
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon