Parallel matrix transpose algorithms on distributed memory concurrent computers
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide app...
Uloženo v:
| Vydáno v: | Proceedings of the Scalable Parallel Libraries Conference , October 6-8, 1993, Mississippi State, Mississippi s. 245 - 252 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE Comput. Soc. Press
1993
|
| Témata: | |
| ISBN: | 0818649801, 9780818649806 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.< > |
|---|---|
| AbstractList | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.< > |
| Author | Jaeyoung Choi Dongarra, J.J. Walker, D.W. |
| Author_xml | – sequence: 1 surname: Jaeyoung Choi fullname: Jaeyoung Choi organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA – sequence: 2 givenname: J.J. surname: Dongarra fullname: Dongarra, J.J. organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA – sequence: 3 givenname: D.W. surname: Walker fullname: Walker, D.W. organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA |
| BookMark | eNotj11LwzAYhQMq6Obuxav8gdZ8N7mU4RcUNnD34237VitNU5IM3L-3MM_Nc-CBA2dFrqcwISEPnJWcM_f0ua-3JXdOltJord0VWTHLrVHOMn5LNin9sCVKKab0HdntIcI44kg95Dj80hxhSnNISGH8CnHI3z7RMNFuSItvThk76tGHeKZtmNpTjDjlpfp5UTHdk5sexoSbf67J4fXlsH0v6t3bx_a5LgbrciGEs00PXdMoKyvBtTQgKnTKcHQtOAlCVEaBBIlghO6d7DWTPRctVw5buSaPl9kBEY9zHDzE8_HyWP4BGYhPmg |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/SPLC.1993.365559 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EndPage | 252 |
| ExternalDocumentID | 365559 |
| GroupedDBID | 6IE 6IK 6IL AAJGR AAWTH ACGHX ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK OCL RIB RIC RIE RIL |
| ID | FETCH-LOGICAL-i89t-2298bfadbb483721536a27e9461e9ca93a22764a3a3ea625f93f503f12c149ec3 |
| IEDL.DBID | RIE |
| ISBN | 0818649801 9780818649806 |
| IngestDate | Tue Aug 26 22:02:01 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i89t-2298bfadbb483721536a27e9461e9ca93a22764a3a3ea625f93f503f12c149ec3 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_365559 |
| PublicationCentury | 1900 |
| PublicationDate | 19930000 |
| PublicationDateYYYYMMDD | 1993-01-01 |
| PublicationDate_xml | – year: 1993 text: 19930000 |
| PublicationDecade | 1990 |
| PublicationTitle | Proceedings of the Scalable Parallel Libraries Conference , October 6-8, 1993, Mississippi State, Mississippi |
| PublicationTitleAbbrev | SPLC |
| PublicationYear | 1993 |
| Publisher | IEEE Comput. Soc. Press |
| Publisher_xml | – name: IEEE Comput. Soc. Press |
| SSID | ssj0000444045 |
| Score | 1.2370282 |
| Snippet | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 245 |
| SubjectTerms | Application software Computer architecture Concurrent computing Distributed computing Laboratories Lifting equipment Linear algebra Matrix decomposition Packaging Scattering |
| Title | Parallel matrix transpose algorithms on distributed memory concurrent computers |
| URI | https://ieeexplore.ieee.org/document/365559 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ07T8MwEIAtWjEwFUoRb3lgTZvYqR3PFRVDVSLRoVtlOxeo1CYoSRH8e_xIi5BY2JLIOUW2nHv47juEHsIYjB4LbXiJZ0GshAzkmKhAGXMXdMJopNxKz_h8niyXIm05264WBgBc8hkM7aU7y89KvbOhshFlY2MAd1CHc-ZLtQ7hFIs9M9aJIzxGCYuF-fO2fJ39_eGUMhSjl3Q2sYV6dOhl_uqt4lTLtPevjzpFg58SPZwelM8ZOoKij3r7Hg243bLn6DmVlW2XssFbC-P_xI2nmdeA5ea1rNbN27bGZYEzS9C1za8gw1ubfvuFjausPb4J61ZwPUCL6eNi8hS0TRSCdSKagBCRqFxmSll0vNHvlEnCQcQsAqGloJIQzmJJJQVpfKFc0Hwc0jwi2vhOoOkF6hZlAZcIR2DGRiLRlrdDNIhIxZJnRpxU5uX8CvXt7KzePSZj5Sfm-s-nN-jEZw7aWMYt6jbVDu7Qsf5o1nV175b2G-8Ooss |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3fT4MwEMcbnSb6NJ0z_rYPvrJBy4A-Ly4z4iRxD3tb2nLokg0MMKP_vT1gMya--AYELqQNXO969_kScme7YPyYjeklP7ZcJaQlB0xZyix3QQced1Q106E_mQSzmYgaznbVCwMAVfEZ9PCw2suPM73GVFmfewOzAN4leyic1TRrbRMqCD4z65OK8egEnivMv7ch7GzOt_uUtui_ROEQW_V4r7b6S12lci6j9r9e64h0f5r0aLR1P8dkB9IOaW9UGmjz0Z6Q50jmKJiypCvE8X_SsuaZF0Dl8jXLF-XbqqBZSmNk6KL8FcR0hQW4X9QEy7oGOFHdGC66ZDq6nw7HViOjYC0CUVqMiUAlMlYK4fHGw3NPMh-E6zkgtBRcMuZ7ruSSgzTRUCJ4MrB54jBtoifQ_JS00iyFM0IdMPc6ItBI3GEahKNc6cfGnFTm4eScdHB05u81KGNeD8zFn1dvycF4-hTOw4fJ4yU5rOsIMbNxRVplvoZrsq8_ykWR31TT_A2aDaYU |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+Scalable+Parallel+Libraries+Conference+%2C+October+6-8%2C+1993%2C+Mississippi+State%2C+Mississippi&rft.atitle=Parallel+matrix+transpose+algorithms+on+distributed+memory+concurrent+computers&rft.au=Jaeyoung+Choi&rft.au=Dongarra%2C+J.J.&rft.au=Walker%2C+D.W.&rft.date=1993-01-01&rft.pub=IEEE+Comput.+Soc.+Press&rft.isbn=9780818649806&rft.spage=245&rft.epage=252&rft_id=info:doi/10.1109%2FSPLC.1993.365559&rft.externalDocID=365559 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/sc.gif&client=summon&freeimage=true |

