Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters
For the execution of parallel HPC applications on GPU-ready clusters, high communication latency between GPUs over nodes will be a serious problem on strong scalability. To reduce the communication latency between GPUs, we proposed the Tightly Coupled Accelerator (TCA) architecture and developed the...
Uložené v:
| Vydané v: | Proceedings / IEEE International Conference on Cluster Computing s. 627 - 634 |
|---|---|
| Hlavní autori: | , , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.09.2015
|
| Predmet: | |
| ISSN: | 1552-5244 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | For the execution of parallel HPC applications on GPU-ready clusters, high communication latency between GPUs over nodes will be a serious problem on strong scalability. To reduce the communication latency between GPUs, we proposed the Tightly Coupled Accelerator (TCA) architecture and developed the PEACH2 board as a proof-of-concept interconnection system for TCA. Although PEACH2 provides very low communication latency, there are some hardware limitations due to its implementation depending on PCIe technology, such as the practical number of nodes in a system which is 16 currently named sub-cluster. More number of nodes should be connected by conventional interconnections such as InfiniBand, and the entire network system is configured as a hybrid one with global conventional network and local high-speed network by PEACH2. For ease of user programmability, it is desirable to operate such a complicated communication system at the library or language level (which hides the system). In this paper, we develop a hybrid interconnection network system combining PEACH2 and InfiniBand, and implement it based on a high-level PGAS language for accelerated clusters named XcalableACC (XACC). A preliminary performance evaluation confirms that the hybrid network improves the performance based on the Himeno benchmark for stencil computation by up to 40%, relative to MVAPICH2 with GDR on InfiniBand. Additionally, Allgather collective communication with a hybrid network improves the performance by up to 50% for networks of 8 to 16 nodes. The combination of local communication, supported by the low latency of PEACH2 and global communication supported by the high bandwidth and scalability of InfiniBand, results in an improvement of overall performance. |
|---|---|
| AbstractList | For the execution of parallel HPC applications on GPU-ready clusters, high communication latency between GPUs over nodes will be a serious problem on strong scalability. To reduce the communication latency between GPUs, we proposed the Tightly Coupled Accelerator (TCA) architecture and developed the PEACH2 board as a proof-of-concept interconnection system for TCA. Although PEACH2 provides very low communication latency, there are some hardware limitations due to its implementation depending on PCIe technology, such as the practical number of nodes in a system which is 16 currently named sub-cluster. More number of nodes should be connected by conventional interconnections such as InfiniBand, and the entire network system is configured as a hybrid one with global conventional network and local high-speed network by PEACH2. For ease of user programmability, it is desirable to operate such a complicated communication system at the library or language level (which hides the system). In this paper, we develop a hybrid interconnection network system combining PEACH2 and InfiniBand, and implement it based on a high-level PGAS language for accelerated clusters named XcalableACC (XACC). A preliminary performance evaluation confirms that the hybrid network improves the performance based on the Himeno benchmark for stencil computation by up to 40%, relative to MVAPICH2 with GDR on InfiniBand. Additionally, Allgather collective communication with a hybrid network improves the performance by up to 50% for networks of 8 to 16 nodes. The combination of local communication, supported by the low latency of PEACH2 and global communication supported by the high bandwidth and scalability of InfiniBand, results in an improvement of overall performance. |
| Author | Odajima, Tetsuya Boku, Taisuke Nakao, Masahiro Murai, Hitoshi Tabuchi, Akihiro Hanawa, Toshihiro Sato, Mitsuhisa |
| Author_xml | – sequence: 1 givenname: Tetsuya surname: Odajima fullname: Odajima, Tetsuya email: odajima@hpcs.cs.tsukuba.ac.jp organization: Grad. Sch. of Syst. & Inf. Eng., Univ. of Tsukuba, Tsukuba, Japan – sequence: 2 givenname: Taisuke surname: Boku fullname: Boku, Taisuke organization: Center for Comput. Sci., Univ. of Tsukuba, Tsukuba, Japan – sequence: 3 givenname: Toshihiro surname: Hanawa fullname: Hanawa, Toshihiro organization: Inf. Technol. Center, Univ. of Tokyo, Tokyo, Japan – sequence: 4 givenname: Hitoshi surname: Murai fullname: Murai, Hitoshi – sequence: 5 givenname: Masahiro surname: Nakao fullname: Nakao, Masahiro – sequence: 6 givenname: Akihiro surname: Tabuchi fullname: Tabuchi, Akihiro organization: Grad. Sch. of Syst. & Inf. Eng., Univ. of Tsukuba, Tsukuba, Japan – sequence: 7 givenname: Mitsuhisa surname: Sato fullname: Sato, Mitsuhisa |
| BookMark | eNotj11LwzAYhSNMcM7dC97kD3Tmq0l7OcOcg4JDN_BuvGmTGkhTSTtk_96KXp3DeeCBc4tmsY8WoXtKVpSS8lFXx_fD5m3FCM2nhV2hZakKKqTiMi8LNUNzmucsy5kQN2g5DN4QJpUUJWFzFF4uJvkG677rztHXMPo-4m8_fuKDXmOIDd5F56N_-q0TAryHBCHYgPepbxN0nY8triC2Z2gt_qghgAl2rTV2fcLb_RHrcB5Gm4Y7dO0gDHb5nwt0fN4c9EtWvW53el1lnjI-Zo5Loqy1zDTcKEOoaXLmGJQCKHXgLNSCOKKENIVRjonaTE_LgjZKNkxyvkAPf14_WU5fyXeQLifFiZKS8h-YIVtr |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/CLUSTER.2015.112 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781467365987 146736598X |
| EndPage | 634 |
| ExternalDocumentID | 7307661 |
| Genre | orig-research |
| GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS |
| ID | FETCH-LOGICAL-i123t-f3607eee2bd3b7b01bd52f2a94a11fafeac40f0746b8b7f24cb659981d76d2633 |
| IEDL.DBID | RIE |
| ISSN | 1552-5244 |
| IngestDate | Wed Aug 27 02:50:17 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i123t-f3607eee2bd3b7b01bd52f2a94a11fafeac40f0746b8b7f24cb659981d76d2633 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_7307661 |
| PublicationCentury | 2000 |
| PublicationDate | 20150901 |
| PublicationDateYYYYMMDD | 2015-09-01 |
| PublicationDate_xml | – month: 09 year: 2015 text: 20150901 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings / IEEE International Conference on Cluster Computing |
| PublicationTitleAbbrev | CLUSTER |
| PublicationYear | 2015 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib026764902 ssj0037306 |
| Score | 1.9364843 |
| Snippet | For the execution of parallel HPC applications on GPU-ready clusters, high communication latency between GPUs over nodes will be a serious problem on strong... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 627 |
| SubjectTerms | Accelerator Arrays Communication systems Electronics packaging GPU Cluster Graphics processing units Interconnect PGAS Language Programming Scalability Tightly Coupled Accelerators XcalableACC |
| Title | Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters |
| URI | https://ieeexplore.ieee.org/document/7307661 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JS8NAFB5q8eCpaivuzMGjsVlmyRxrsFYoJWALvZVZIVBT6SL4752XppWCF29DMkwy8yZ5y7zvfQg9KC28UmZhYJVQAfF7KBCC68CwSJKUMpvatCKb4KNROp2KvIEe91gYa22VfGafoFmd5ZuF3kCorOtH4gx8nSPO2Rartds7MeOMiHBfOirxfStkEaXgbBGyO6IMRTcbTt69rQh5XRQgNAfEKpVe6bf-90anqPML0MP5XvWcoYYtz1Frx9CA6w-2jeaDb0Bk4QMYCIbYKx5nPSxLg99KV5TFMzT9LYlzuQR6lTmMD5lbH_4BeFhHNfHUixTAVr0sw97cxa_5BGfzDVRbWHXQpP8yzgZBza8QFF5frQOXsJD7KcXKJIqrMFKGxi6WgsgoctL5fzIJHRCSqFRxFxOtGPXuWWQ4MzFLkgvULBelvURY-x6CGesSyYnTSkYqpKH25o-WgqbyCrVh8Waf2xIas3rdrv--fINOQDbbVK5b1FwvN_YOHeuvdbFa3ldy_wH9FqzU |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFA5jCvo0dRPv5sFH69I2l-ZxFnXDOQZusLeRpAkMZie7CP57c7puMvDFt9CGtElOey453_kQutNGeqXMSWC11AH1MhRIKUyQ8VDRhHGb2KQgmxC9XjIayX4F3W-xMNbaIvnMPkCzOMvPZmYFobKmH0lw8HX2GKURWaO1NtITccGpJNviUbHvXWCLGAN3i9LNISWRzbQ7fPfWImR2MQDR7FCrFJrlufa_dzpCjV-IHu5vlc8xqtj8BNU2HA24_GTraNr-BkwW3gGCYIi-4kHawirPcCd3k3zyCE1_S-G-mgPByhTGh9ytD_8A3C3jmnjkNxXgVq00xd7gxS_9IU6nK6i3sGig4fPTIG0HJcNCMPEaaxm4mBPhpxTpLNZCk1BnLHKRklSFoVPO_5UpcUBJohMtXESN5sw7aGEmeBbxOD5F1XyW2zOEje8heWZdrAR1RqtQE0aMN4CMkixR56gOizf-XBfRGJfrdvH35Vt00B68dcfdTu_1Eh3CPq0Tu65QdTlf2Wu0b76Wk8X8ppCBHygasBs |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+IEEE+International+Conference+on+Cluster+Computing&rft.atitle=Hybrid+Communication+with+TCA+and+InfiniBand+on+a+Parallel+Programming+Language+XcalableACC+for+GPU+Clusters&rft.au=Odajima%2C+Tetsuya&rft.au=Boku%2C+Taisuke&rft.au=Hanawa%2C+Toshihiro&rft.au=Murai%2C+Hitoshi&rft.date=2015-09-01&rft.pub=IEEE&rft.issn=1552-5244&rft.spage=627&rft.epage=634&rft_id=info:doi/10.1109%2FCLUSTER.2015.112&rft.externalDocID=7307661 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1552-5244&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1552-5244&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1552-5244&client=summon |