Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communication
The increasing trend of manycore processors makes multithreaded communication more important to avoid costly global synchronization among cores. One of the representative approaches that require multithreaded communication is the global task-based programming model. In the model, a program is divide...
Uložené v:
| Vydané v: | The Journal of supercomputing Ročník 80; číslo 14; s. 20715 - 20742 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
New York
Springer US
01.09.2024
Springer Nature B.V |
| Predmet: | |
| ISSN: | 0920-8542, 1573-0484 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | The increasing trend of manycore processors makes multithreaded communication more important to avoid costly global synchronization among cores. One of the representative approaches that require multithreaded communication is the global task-based programming model. In the model, a program is divided into tasks, and tasks are asynchronously executed by each node, and independent thread-to-thread communications are expected. However, the Message passing interface (MPI) based approach is not efficient because of design issues. In this research, we design and implement the utofu transport layer in an abstracted communication library called Unified communication-X (UCX) for efficient remote direct memory access (RDMA) based multithreaded communication on Tofu Interconnect D. The evaluation results on Fugaku show that UCX can significantly improve the multithreaded performance over MPI, while maintaining portability between systems thanks to UCX. UCX shows about 32.8 times lower latency than Fujitsu MPI with 24 threads in the multithreaded pingpong benchmark and about 37.8 times higher update rate than Fujitsu MPI with 24 threads on 256 nodes in multithreaded GUPs benchmark. |
|---|---|
| AbstractList | The increasing trend of manycore processors makes multithreaded communication more important to avoid costly global synchronization among cores. One of the representative approaches that require multithreaded communication is the global task-based programming model. In the model, a program is divided into tasks, and tasks are asynchronously executed by each node, and independent thread-to-thread communications are expected. However, the Message passing interface (MPI) based approach is not efficient because of design issues. In this research, we design and implement the utofu transport layer in an abstracted communication library called Unified communication-X (UCX) for efficient remote direct memory access (RDMA) based multithreaded communication on Tofu Interconnect D. The evaluation results on Fugaku show that UCX can significantly improve the multithreaded performance over MPI, while maintaining portability between systems thanks to UCX. UCX shows about 32.8 times lower latency than Fujitsu MPI with 24 threads in the multithreaded pingpong benchmark and about 37.8 times higher update rate than Fujitsu MPI with 24 threads on 256 nodes in multithreaded GUPs benchmark. |
| Author | Boku, Taisuke Tsuji, Miwako Watanabe, Yutaka Murai, Hitoshi Sato, Mitsuhisa |
| Author_xml | – sequence: 1 givenname: Yutaka surname: Watanabe fullname: Watanabe, Yutaka email: ywatanabe@hpcs.cs.tsukuba.ac.jp organization: Graduate School of Science and Technology, University of Tsukuba – sequence: 2 givenname: Miwako surname: Tsuji fullname: Tsuji, Miwako organization: RIKEN Center for Computational Science – sequence: 3 givenname: Hitoshi surname: Murai fullname: Murai, Hitoshi organization: RIKEN Center for Computational Science – sequence: 4 givenname: Taisuke surname: Boku fullname: Boku, Taisuke organization: Graduate School of Science and Technology, University of Tsukuba, Center for Computational Sciences, University of Tsukuba – sequence: 5 givenname: Mitsuhisa surname: Sato fullname: Sato, Mitsuhisa organization: RIKEN Center for Computational Science |
| BookMark | eNp9kEtLAzEUhYMoWKt_wFXAdfTm0XkspT6h4KYFdyFmbtrRTlKTjFbwxztawZ2ruzjnOxe-I7Lvg0dCTjmcc4DyInEuRMlAKAaFAM62e2TEJ6VkoCq1T0ZQC2DVRIlDcpTSMwAoWcoR-bzC1C49Nb6hG4wuxM54ixTfzLo3uQ2eBkcX00c6RDSvkM6D6-m9zxht8B5tpld0aN30S_PS0xzeTWwSReda26LPtOvXuc2riKbBhtrQdb1v7c_0MTlwZp3w5PeOyeLmej69Y7OH2_vp5YxZyVVmJShbgnQARhUOqsYaJ6VEUByapqxV7UAgtyWvBRayrhounipjlXW2AA5yTM52u5sYXntMWT-HPvrhpZZiUkxqqQYdYyJ2LRtDShGd3sS2M_FDc9DflvXOsh4s6x_LejtAcgeloeyXGP-m_6G-AKOtg0E |
| Cites_doi | 10.1109/SC.2014.45 10.1109/CCGRID.2017.149 10.1145/3236367.3236382 10.1109/CLUSTER.2018.00090 10.1007/978-3-319-50995-2_8 10.1109/SC41405.2020.00051 10.1109/SC41404.2022.00082 10.1007/978-3-030-04918-8_13 10.1016/j.jnca.2008.07.008 10.1109/CLUSTER.2019.8891015 10.1109/IPDPS.2003.1213363 10.1109/PAW-ATM56565.2022.00010 10.1177/1094342009360206 10.1145/3392717.3392773 10.1007/978-3-319-73814-7_4 10.1016/j.parco.2018.12.008 10.1109/HOTI.2015.13 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2024. corrected publication 2024 The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: The Author(s) 2024. corrected publication 2024 – notice: The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | C6C AAYXX CITATION 8FE 8FG ABJCF AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- L6V M7S P5Z P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| DOI | 10.1007/s11227-024-06201-x |
| DatabaseName | Springer Nature OA Free Journals CrossRef ProQuest SciTech Collection ProQuest Technology Collection SciTech Premium Collection ProQuest Central UK/Ireland Health Research Premium Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database ProQuest Engineering Collection Engineering Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
| DatabaseTitle | CrossRef Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest Central Korea ProQuest Central (New) Engineering Collection Advanced Technologies & Aerospace Collection Engineering Database ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition Materials Science & Engineering Collection ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | CrossRef Computer Science Database |
| Database_xml | – sequence: 1 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1573-0484 |
| EndPage | 20742 |
| ExternalDocumentID | 10_1007_s11227_024_06201_x |
| GroupedDBID | -4Z -59 -5G -BR -EM -Y2 -~C .4S .86 .DC .VR 06D 0R~ 0VY 123 199 1N0 1SB 2.D 203 28- 29L 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 78A 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYOK AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDBF ABDPE ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACUHS ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADMLS ADQRH ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AI. AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. B0M BA0 BBWZM BDATZ BGNMA BSONS C6C CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EAD EAP EAS EBD EBLON EBS EDO EIOEI EJD EMK EPL ESBYG ESX F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ H~9 I-F I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAK LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P9O PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RNI ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VH1 W23 W48 WH7 WK8 YLTOR Z45 Z7R Z7X Z7Z Z83 Z88 Z8M Z8N Z8R Z8T Z8W Z92 ZMTXR ~8M ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABJCF ABRTQ ACSTC ADHKG ADKFA AEZWR AFDZB AFFHD AFHIU AFKRA AFOHR AGQPQ AHPBZ AHWEU AIXLP ARAPS ATHPR AYFIA BENPR BGLVJ CCPQU CITATION HCIFZ K7- M7S PHGZM PHGZT PQGLB PTHSS 8FE 8FG AZQEC DWQXO GNUQQ JQ2 L6V P62 PKEHL PQEST PQQKQ PQUKI PRINS |
| ID | FETCH-LOGICAL-c314t-704c703f00a46f08dcaf333e0410dd7949f02e1c7192e6398d12b8ac4cfc60103 |
| IEDL.DBID | RSV |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001237797200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0920-8542 |
| IngestDate | Sun Nov 30 05:05:01 EST 2025 Sat Nov 29 04:27:47 EST 2025 Fri Feb 21 02:38:43 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 14 |
| Keywords | Tofu Interconnect D UCX A64FX One-sided communication Supercomputer Fugaku Multithreaded communication |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c314t-704c703f00a46f08dcaf333e0410dd7949f02e1c7192e6398d12b8ac4cfc60103 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| OpenAccessLink | https://link.springer.com/10.1007/s11227-024-06201-x |
| PQID | 3256593404 |
| PQPubID | 2043774 |
| PageCount | 28 |
| ParticipantIDs | proquest_journals_3256593404 crossref_primary_10_1007_s11227_024_06201_x springer_journals_10_1007_s11227_024_06201_x |
| PublicationCentury | 2000 |
| PublicationDate | 20240900 2024-09-00 20240901 |
| PublicationDateYYYYMMDD | 2024-09-01 |
| PublicationDate_xml | – month: 9 year: 2024 text: 20240900 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationSubtitle | An International Journal of High-Performance Computer Design, Analysis, and Use |
| PublicationTitle | The Journal of supercomputing |
| PublicationTitleAbbrev | J Supercomput |
| PublicationYear | 2024 |
| Publisher | Springer US Springer Nature B.V |
| Publisher_xml | – name: Springer US – name: Springer Nature B.V |
| References | SalaKTeruelXPerezJMPeñaAJBeltranVLabartaJIntegrating blocking and non-blocking MPI primitives with task-based programming modelsParallel Comput20198515316610.1016/j.parco.2018.12.008 Zambre R, Chandramowliswharan A, Balaji P (2020) How i learned to stop worrying about user-visible endpoints and love MPI. In: Proceedings of the 34th ACM International Conference on Supercomputing. ICS ’20. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3392717.3392773 Oak Ridge National Laboratory: Frontier. https://www.olcf.ornl.gov/frontier. [Online; accessed 23-July 2023] Watanabe Y, Sato M, Tsuji M, Murai H, Boku T (2022) Design and performance evaluation of UCX for Tofu-D interconnect with OpenSHMEM-UCX on fugaku. In: 2022 IEEE/ACM Parallel Applications Workshop: Alternatives to MPI+ X (PAW-ATM), pp 52–61 . IEEE Bell C, Bonachea D (2003) A new DMA registration strategy for pinning-based high performance networks. In: Proceedings International Parallel and Distributed Processing Symposium, p 10. IEEE Klib–a generic library in C. https://attractivechaos.github.io/klib/. [Online; accessed 23-Mar 2024] Fujitsu global: FUJITSU Processor A64FX. https://www.fujitsu.com/global/products/computing/servers/supercomputer/a64fx/. [Online; accessed 23-July 2023] Fujitsu: Technical Computing Suite V4.0L20 Development Studio uTofu User’s Guide. https://software.fujitsu.com/jp/manual/manualfiles/m210007/j2ul2482/02enz003/j2ul-2482-02enz0.pdf. [Online; accessed 23-July 2023] Sato M, Ishikawa Y, Tomita H, Kodama Y, Odajima T, Tsuji M, Yashiro H, Aoki M, Shida N, Miyoshi I, et al. (2020) Co-design for a64fx manycore processor and “fugaku”. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–15. IEEE The Unified Communication X Library. http://www.openucx.org. [Online; accessed 23-July 2023] Zambre R, Chandramowlishwaran A (2022) Lessons learned on MPI+ threads communication. In: SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–16. IEEE Naughton T, Aderholdt F, Baker M, Pophale S, Gorentla Venkata M, Imam N (2019) Oak ridge OpenSHMEM benchmark suite. In: OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity: 5th Workshop, OpenSHMEM 2018, Baltimore, MD, USA, August 21–23, 2018, Revised Selected Papers 5, pp 202–216. Springer RIKEN Center for Computational Science: About Fugaku. https://www.r-ccs.riken.jp/en/fugaku/about/. [Online; accessed 23-July 2023] Patinyasakdikul T, Eberius D, Bosilca G, Hjelm N (2019) Give MPI threading a fair chance: a study of multithreaded MPI designs. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–11. IEEE Baker M, Aderholdt F, Venkata MG, Shamis P (2016) OpenSHMEM-UCX: evaluation of UCX for implementing openshmem programming model. In: OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments: Third Workshop, OpenSHMEM 2016, Baltimore, MD, USA, August 2–4, 2016, Revised Selected Papers 3, pp 114–130 . Springer Sridharan S, Dinan J, Kalamkar DD (2014) Enabling efficient multithreaded MPI communication through a library-based implementation of MPI endpoints. In: SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 487–498. IEEE RIKEN center for computational science: an overview of RIKEN MPI (MPICH-Tofu). https://www.r-ccs.riken.jp/wp/wp-content/uploads/2021/01/MPICH-Tofu.pdf. [Online; accessed 23-July 2023] Fujitsu global: FUJITSU Supercomputer PRIMEHPC. https://openucx.org/introduction/. [Online; accessed 23-July 2023] Papadopoulou N, Oden L, Balaji P (2017) A performance study of UCX over InfiniBand. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp 345–354. IEEE GitHub openucx/ucx. https://github.com/openucx/ucx. [Online; accessed 23-July 2023] OuLHeXHanJAn efficient design for fast memory registration in RDMAJ Netw Comput Appl200932364265110.1016/j.jnca.2008.07.008 GitHub ornl-languages/osb. https://github.com/ornl-languages/osb. [Online; accessed 23-July 2023] Shamis P, Venkata MG, Lopez MG, Baker MB, Hernandez O, Itigin Y, Dubman M, Shainer G, Graham RL, Liss L, et al (2015) Ucx: an open source framework for HPC network APIS and beyond. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp 40–43, IEEE BalajiPBuntinasDGoodellDGroppWThakurRFine-grained multithreading support for hybrid threaded MPI programmingInt J High Perform Comput Appl2010241495710.1177/1094342009360206 Sala K, Bellón J, Farré P, Teruel X, Perez JM, Peña AJ, Holmes D, Beltran V, Labarta J (2018) Improving the interoperability between mpi and task-based programming models. In: Proceedings of the 25th European MPI Users’ Group Meeting. EuroMPI ’18. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3236367.3236382 Ajima Y, Kawashima T, Okamoto T, Shida N, Hirai K, Shimizu T, Hiramoto S, Ikeda Y, Yoshikawa T, Uchida K, et al. (2018) The Tofu interconnect D. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp 646–654. IEEE Bouteiller A, Pophale S, Boehm S, Baker MB, Venkata MG (2018) Evaluating contexts in OpenSHMEM-X reference implementation. In: OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence: 4th Workshop, OpenSHMEM 2017, Annapolis, MD, USA, August 7-9, 2017, Revised Selected Papers 4, pp 50–62. Springer Baker M, Aderholdt F, Venkata MG, Shamis P (2016) OpenSHMEM-UCX: evaluation of UCX for implementing OpenSHMEM programming model. In: OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments: Third Workshop, OpenSHMEM 2016, Baltimore, MD, USA, August 2–4, 2016, Revised Selected Papers 3, pp 114–130. Springer K Sala (6201_CR6) 2019; 85 6201_CR19 6201_CR18 L Ou (6201_CR24) 2009; 32 6201_CR9 P Balaji (6201_CR16) 2010; 24 6201_CR7 6201_CR15 6201_CR8 6201_CR14 6201_CR5 6201_CR17 6201_CR3 6201_CR11 6201_CR4 6201_CR10 6201_CR1 6201_CR13 6201_CR2 6201_CR12 6201_CR20 6201_CR26 6201_CR25 6201_CR28 6201_CR27 6201_CR22 6201_CR21 6201_CR23 |
| References_xml | – reference: Oak Ridge National Laboratory: Frontier. https://www.olcf.ornl.gov/frontier. [Online; accessed 23-July 2023] – reference: Zambre R, Chandramowlishwaran A (2022) Lessons learned on MPI+ threads communication. In: SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–16. IEEE – reference: Klib–a generic library in C. https://attractivechaos.github.io/klib/. [Online; accessed 23-Mar 2024] – reference: RIKEN Center for Computational Science: About Fugaku. https://www.r-ccs.riken.jp/en/fugaku/about/. [Online; accessed 23-July 2023] – reference: Zambre R, Chandramowliswharan A, Balaji P (2020) How i learned to stop worrying about user-visible endpoints and love MPI. In: Proceedings of the 34th ACM International Conference on Supercomputing. ICS ’20. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3392717.3392773 – reference: RIKEN center for computational science: an overview of RIKEN MPI (MPICH-Tofu). https://www.r-ccs.riken.jp/wp/wp-content/uploads/2021/01/MPICH-Tofu.pdf. [Online; accessed 23-July 2023] – reference: GitHub openucx/ucx. https://github.com/openucx/ucx. [Online; accessed 23-July 2023] – reference: BalajiPBuntinasDGoodellDGroppWThakurRFine-grained multithreading support for hybrid threaded MPI programmingInt J High Perform Comput Appl2010241495710.1177/1094342009360206 – reference: The Unified Communication X Library. http://www.openucx.org. [Online; accessed 23-July 2023] – reference: Papadopoulou N, Oden L, Balaji P (2017) A performance study of UCX over InfiniBand. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp 345–354. IEEE – reference: Patinyasakdikul T, Eberius D, Bosilca G, Hjelm N (2019) Give MPI threading a fair chance: a study of multithreaded MPI designs. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–11. IEEE – reference: Baker M, Aderholdt F, Venkata MG, Shamis P (2016) OpenSHMEM-UCX: evaluation of UCX for implementing OpenSHMEM programming model. In: OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments: Third Workshop, OpenSHMEM 2016, Baltimore, MD, USA, August 2–4, 2016, Revised Selected Papers 3, pp 114–130. Springer – reference: SalaKTeruelXPerezJMPeñaAJBeltranVLabartaJIntegrating blocking and non-blocking MPI primitives with task-based programming modelsParallel Comput20198515316610.1016/j.parco.2018.12.008 – reference: Sala K, Bellón J, Farré P, Teruel X, Perez JM, Peña AJ, Holmes D, Beltran V, Labarta J (2018) Improving the interoperability between mpi and task-based programming models. In: Proceedings of the 25th European MPI Users’ Group Meeting. EuroMPI ’18. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3236367.3236382 – reference: Sridharan S, Dinan J, Kalamkar DD (2014) Enabling efficient multithreaded MPI communication through a library-based implementation of MPI endpoints. In: SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 487–498. IEEE – reference: Bell C, Bonachea D (2003) A new DMA registration strategy for pinning-based high performance networks. In: Proceedings International Parallel and Distributed Processing Symposium, p 10. IEEE – reference: Fujitsu global: FUJITSU Processor A64FX. https://www.fujitsu.com/global/products/computing/servers/supercomputer/a64fx/. [Online; accessed 23-July 2023] – reference: Baker M, Aderholdt F, Venkata MG, Shamis P (2016) OpenSHMEM-UCX: evaluation of UCX for implementing openshmem programming model. In: OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments: Third Workshop, OpenSHMEM 2016, Baltimore, MD, USA, August 2–4, 2016, Revised Selected Papers 3, pp 114–130 . Springer – reference: GitHub ornl-languages/osb. https://github.com/ornl-languages/osb. [Online; accessed 23-July 2023] – reference: Shamis P, Venkata MG, Lopez MG, Baker MB, Hernandez O, Itigin Y, Dubman M, Shainer G, Graham RL, Liss L, et al (2015) Ucx: an open source framework for HPC network APIS and beyond. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp 40–43, IEEE – reference: Naughton T, Aderholdt F, Baker M, Pophale S, Gorentla Venkata M, Imam N (2019) Oak ridge OpenSHMEM benchmark suite. In: OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity: 5th Workshop, OpenSHMEM 2018, Baltimore, MD, USA, August 21–23, 2018, Revised Selected Papers 5, pp 202–216. Springer – reference: Sato M, Ishikawa Y, Tomita H, Kodama Y, Odajima T, Tsuji M, Yashiro H, Aoki M, Shida N, Miyoshi I, et al. (2020) Co-design for a64fx manycore processor and “fugaku”. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–15. IEEE – reference: Fujitsu: Technical Computing Suite V4.0L20 Development Studio uTofu User’s Guide. https://software.fujitsu.com/jp/manual/manualfiles/m210007/j2ul2482/02enz003/j2ul-2482-02enz0.pdf. [Online; accessed 23-July 2023] – reference: Watanabe Y, Sato M, Tsuji M, Murai H, Boku T (2022) Design and performance evaluation of UCX for Tofu-D interconnect with OpenSHMEM-UCX on fugaku. In: 2022 IEEE/ACM Parallel Applications Workshop: Alternatives to MPI+ X (PAW-ATM), pp 52–61 . IEEE – reference: Bouteiller A, Pophale S, Boehm S, Baker MB, Venkata MG (2018) Evaluating contexts in OpenSHMEM-X reference implementation. In: OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence: 4th Workshop, OpenSHMEM 2017, Annapolis, MD, USA, August 7-9, 2017, Revised Selected Papers 4, pp 50–62. Springer – reference: Ajima Y, Kawashima T, Okamoto T, Shida N, Hirai K, Shimizu T, Hiramoto S, Ikeda Y, Yoshikawa T, Uchida K, et al. (2018) The Tofu interconnect D. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp 646–654. IEEE – reference: Fujitsu global: FUJITSU Supercomputer PRIMEHPC. https://openucx.org/introduction/. [Online; accessed 23-July 2023] – reference: OuLHeXHanJAn efficient design for fast memory registration in RDMAJ Netw Comput Appl200932364265110.1016/j.jnca.2008.07.008 – ident: 6201_CR17 doi: 10.1109/SC.2014.45 – ident: 6201_CR4 – ident: 6201_CR15 doi: 10.1109/CCGRID.2017.149 – ident: 6201_CR5 doi: 10.1145/3236367.3236382 – ident: 6201_CR11 doi: 10.1109/CLUSTER.2018.00090 – ident: 6201_CR14 doi: 10.1007/978-3-319-50995-2_8 – ident: 6201_CR13 – ident: 6201_CR28 doi: 10.1007/978-3-319-50995-2_8 – ident: 6201_CR8 – ident: 6201_CR2 doi: 10.1109/SC41405.2020.00051 – ident: 6201_CR19 doi: 10.1109/SC41404.2022.00082 – ident: 6201_CR26 doi: 10.1007/978-3-030-04918-8_13 – volume: 32 start-page: 642 issue: 3 year: 2009 ident: 6201_CR24 publication-title: J Netw Comput Appl doi: 10.1016/j.jnca.2008.07.008 – ident: 6201_CR18 doi: 10.1109/CLUSTER.2019.8891015 – ident: 6201_CR25 – ident: 6201_CR27 – ident: 6201_CR1 – ident: 6201_CR23 doi: 10.1109/IPDPS.2003.1213363 – ident: 6201_CR9 doi: 10.1109/PAW-ATM56565.2022.00010 – ident: 6201_CR3 – volume: 24 start-page: 49 issue: 1 year: 2010 ident: 6201_CR16 publication-title: Int J High Perform Comput Appl doi: 10.1177/1094342009360206 – ident: 6201_CR21 doi: 10.1145/3392717.3392773 – ident: 6201_CR20 doi: 10.1007/978-3-319-73814-7_4 – ident: 6201_CR10 – ident: 6201_CR12 – volume: 85 start-page: 153 year: 2019 ident: 6201_CR6 publication-title: Parallel Comput doi: 10.1016/j.parco.2018.12.008 – ident: 6201_CR7 doi: 10.1109/HOTI.2015.13 – ident: 6201_CR22 |
| SSID | ssj0004373 |
| Score | 2.358652 |
| Snippet | The increasing trend of manycore processors makes multithreaded communication more important to avoid costly global synchronization among cores. One of the... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 20715 |
| SubjectTerms | Bandwidths Benchmarks Communication Compilers Computer Science Design High performance computing Interpreters Laboratories Libraries Message passing Network interface cards Network topologies Performance evaluation Processor Architectures Programming Languages Soy products Supercomputers Synchronism Tofu |
| SummonAdditionalLinks | – databaseName: ProQuest Central dbid: BENPR link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3LSgMxFA2-Fm58i9Uqd-FOgzNJ2plZibYWV6VIC90NaR4iQlvbqXThx3uTZhgVdOM6Qxg4JzknyX0QcjmyLFHcSIrWVlCRWO4auXOqRqlsZhJpFUnfbCLpdtPhMOuFC7d5CKss90S_UeuJcnfkNxy1uZFxEYnb6Rt1XaPc62poobFONl2lMuT55v1Dt_dUZUby1RtzhoektCFYSJtZJc_FjCUUNYpGTZRBuvwuTZXf_PFE6pWns_vff94jO8Fzwt2KJPtkzYwPyG7ZzwHC8j4kH20fzgFyrGFaJRRAVREcJhYGrSHgEKBzhP7ELsBfKioXMKMKaAN-1Vk8y9cFFD4kdw7Gl6lAdQMfvlggfaQ2GtTX5JQjMug89FuPNHRnoIrHoqBJJBRuFxbBFE0bpVpJyzk3kYgjrXGZZzZiJlYJekiDPijVMUMCKKGscqdAfkw2xpOxOSHAhEzQ-RmmR5mwaTbKdKxiVEpjWGpSUyNXJTD5dFWEI6_KLTsYc4Qx9zDmyxqpl2jkYUHO8wqKGrku8ayGf5_t9O_Zzsg28xRyUWd1slHMFuacbKn34mU-uwh0_ARpFOea priority: 102 providerName: ProQuest |
| Title | Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communication |
| URI | https://link.springer.com/article/10.1007/s11227-024-06201-x https://www.proquest.com/docview/3256593404 |
| Volume | 80 |
| WOSCitedRecordID | wos001237797200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVPQU databaseName: Advanced Technologies & Aerospace Database customDbUrl: eissn: 1573-0484 dateEnd: 20241214 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: P5Z dateStart: 20230101 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: Computer Science Database customDbUrl: eissn: 1573-0484 dateEnd: 20241214 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: K7- dateStart: 20230101 isFulltext: true titleUrlDefault: http://search.proquest.com/compscijour providerName: ProQuest – providerCode: PRVPQU databaseName: Engineering Database customDbUrl: eissn: 1573-0484 dateEnd: 20241214 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: M7S dateStart: 20230101 isFulltext: true titleUrlDefault: http://search.proquest.com providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 1573-0484 dateEnd: 20241214 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: BENPR dateStart: 20230101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 1573-0484 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEB58HbxYn1itZQ7eNLCbpN3do7YWQSjFF8XLkuYhIrTFbsWDP97ZdJdV0YNe9pIQQmYm883mmxmA45HjkRZWMYK2ksnIibyRu2B6FKt2okitAuWbTUT9fjwcJoMiKWxWst3LJ0l_U1fJbiHnESOfwoI2uS1GyHG1lVebyWP0m_sqG1Is3pUTCoziluRFqszPa3x1RxXG_PYs6r1Nr_a_fW7CRoEu8WyhDluwZMfbUCs7N2BhyDvw3vXEDVRjg9MqdQCr2t84cXjXGSINIWFEvJ24OfrfhzqnxugMu0izevNH9TzHzJNvZ2h9QQryY-iJihkpijLWoP6chrILd72L284lK_owMC1CmbEokJouBkdik20XxEYrJ4SwgQwDY8igExdwG-qI0KIlxBObkJOotdRO5_Ge2IOV8WRs9wG5VBFhPMvNKJEuTkaJCXVIPtFaHtvY1uGkFEc6XZTbSKvCyvnBpnSwqT_Y9K0OjVJiaWF6s1QQiGslQgayDqelhKrh31c7-Nv0Q1jnXsg536wBK9nL3B7Bmn7NnmYvTVg9v-gPrpuwfBWxZs4ovaHvoPXQ9Mr6AUXt4VU |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3PTxQxFH4BJMELCEJcRXkHOUHDTFt2OgdjDOsGsrjxsCR7G7v9QYzJ7srOKiT-Tf6NvnZmMkqiNw6eO2ma9nvv-zp9PwBeTzzPjHCakbSVTGZehEbugpmJ0t1cE6wSHZtNZMOhGo_zjyvws8mFCWGVjU-MjtrOTPhHfiKIm09zIRP5dv6Vha5R4XW1aaFRwWLg7r7TlW3x5qJH53vIef_96Oyc1V0FmBGpLFmWSEMw97QI2fWJskZ7IYRLZJpYS_DMfcJdajLSPo74W9mU08KNNN6E24ugeVfhkRSqGyxqkLE2D1NUL9o5XcnUqeR1kk6VqpdynjFiRJZ0iXTZ7Z9E2Krbew-ykef6W__bDj2BzVpR47vKBLZhxU13YKvpVoG183oKP3oxWAX11OK8TZfAtt45zjxenY2RhpB0MY5mfonxl6kJ4UCmxB7SV_3ltf6yxDIGHC_QxSIcxN0YgzNLMg5tnUXze-rNLlw9yB7swdp0NnXPALnUGelax-0kl17lk9ymJiUd4BxXTrkOHDVAKOZViZGiLSYdYFMQbIoIm-K2A_vN6Re1u1kU7dF34LjBTzv899me_3u2A9g4H324LC4vhoMX8JhH-Ib4un1YK2-W7iWsm2_l58XNq2gICJ8eGle_AObpQUI |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1NSwMxEA2iIl6sn1itOgdvGrqbpN3do7QWRSlFW-ltSfMhImxLuxUP_nhn0122ih7Ec0IImZmdl817M4ScjywLFDeSIrQVVASWZ43cOVWjUDYjiW7lSddsIuh2w-Ew6i2p-B3bvXiSXGgasipNSVqfaFsvhW8-YwHF_EK9JqYwiihyTeBNJiN1PTw-lcpIvnhjjvCSFDYEy2UzP6_xNTWVePPbE6nLPJ3K__e8TbZy1AlXCzfZISsm2SWVoqMD5AG-Rz7ajtABMtEwKSUFUNYEh7GFQWsIOASIHaE_tnNwvxVVRplRKbQBZ3Xmz_J1Dqkj5c7AuEIVuFdwBMYUHUhqo0Ety1P2yaBz3W_d0Lw_A1XcFykNPKHwg2HRnKJpvVAraTnnxkNLaI2BHlmPGV8FiCINIqFQ-wxdQAllVXYP5AdkNRkn5pAAEzJA7GeYHkXChtEo0r7yMVcaw0ITmiq5KEwTTxZlOOKy4HJ2sDEebOwONn6vklphvTgPyVnMEdw1Ii48USWXhbXK4d9XO_rb9DOy0Wt34vvb7t0x2WTO3hklrUZW0-ncnJB19Za-zKanzlM_ASJU6Ls |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Design+and+performance+evaluation+of+UCX+for+the+Tofu+Interconnect+D+on+Fugaku+towards+efficient+multithreaded+communication&rft.jtitle=The+Journal+of+supercomputing&rft.au=Watanabe%2C+Yutaka&rft.au=Tsuji%2C+Miwako&rft.au=Murai%2C+Hitoshi&rft.au=Boku%2C+Taisuke&rft.date=2024-09-01&rft.pub=Springer+US&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=80&rft.issue=14&rft.spage=20715&rft.epage=20742&rft_id=info:doi/10.1007%2Fs11227-024-06201-x&rft.externalDocID=10_1007_s11227_024_06201_x |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon |