Parallel matrix transpose algorithms on distributed memory concurrent computers
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide app...
Uloženo v:
| Vydáno v: | Proceedings of the Scalable Parallel Libraries Conference , October 6-8, 1993, Mississippi State, Mississippi s. 245 - 252 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE Comput. Soc. Press
1993
|
| Témata: | |
| ISBN: | 0818649801, 9780818649806 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.< > |
|---|---|
| AbstractList | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.< > |
| Author | Jaeyoung Choi Dongarra, J.J. Walker, D.W. |
| Author_xml | – sequence: 1 surname: Jaeyoung Choi fullname: Jaeyoung Choi organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA – sequence: 2 givenname: J.J. surname: Dongarra fullname: Dongarra, J.J. organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA – sequence: 3 givenname: D.W. surname: Walker fullname: Walker, D.W. organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA |
| BookMark | eNotj11LwzAYhQMq6Obuxav8gdZ8N7mU4RcUNnD34237VitNU5IM3L-3MM_Nc-CBA2dFrqcwISEPnJWcM_f0ua-3JXdOltJord0VWTHLrVHOMn5LNin9sCVKKab0HdntIcI44kg95Dj80hxhSnNISGH8CnHI3z7RMNFuSItvThk76tGHeKZtmNpTjDjlpfp5UTHdk5sexoSbf67J4fXlsH0v6t3bx_a5LgbrciGEs00PXdMoKyvBtTQgKnTKcHQtOAlCVEaBBIlghO6d7DWTPRctVw5buSaPl9kBEY9zHDzE8_HyWP4BGYhPmg |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/SPLC.1993.365559 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EndPage | 252 |
| ExternalDocumentID | 365559 |
| GroupedDBID | 6IE 6IK 6IL AAJGR AAWTH ACGHX ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK OCL RIB RIC RIE RIL |
| ID | FETCH-LOGICAL-i89t-2298bfadbb483721536a27e9461e9ca93a22764a3a3ea625f93f503f12c149ec3 |
| IEDL.DBID | RIE |
| ISBN | 0818649801 9780818649806 |
| IngestDate | Tue Aug 26 22:02:01 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i89t-2298bfadbb483721536a27e9461e9ca93a22764a3a3ea625f93f503f12c149ec3 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_365559 |
| PublicationCentury | 1900 |
| PublicationDate | 19930000 |
| PublicationDateYYYYMMDD | 1993-01-01 |
| PublicationDate_xml | – year: 1993 text: 19930000 |
| PublicationDecade | 1990 |
| PublicationTitle | Proceedings of the Scalable Parallel Libraries Conference , October 6-8, 1993, Mississippi State, Mississippi |
| PublicationTitleAbbrev | SPLC |
| PublicationYear | 1993 |
| Publisher | IEEE Comput. Soc. Press |
| Publisher_xml | – name: IEEE Comput. Soc. Press |
| SSID | ssj0000444045 |
| Score | 1.2370473 |
| Snippet | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 245 |
| SubjectTerms | Application software Computer architecture Concurrent computing Distributed computing Laboratories Lifting equipment Linear algebra Matrix decomposition Packaging Scattering |
| Title | Parallel matrix transpose algorithms on distributed memory concurrent computers |
| URI | https://ieeexplore.ieee.org/document/365559 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ1NS8MwGMeDDg-epnPiOzl47bYmbV7Ow-FhzIJDdhtJmupga6XtRL-9edJuInjx1vaQloTyvP_-CN07t55QZ6gDTiwLIimyQGsRBbFIeRTp1KQ-3_Ey5bOZWCxk0nK2_SyMtdY3n9kBXPpaflqYLaTKhpTFzgE-RIecs2ZUa59OAeyZ80484TEUzL10FLZ8nd39vko5ksPnZDqGQT06aNb8pa3iTcuk-6-POkH9nxE9nOyNzyk6sHkPdXcaDbj9Zc_QU6JKkEtZ4w3A-D9x3dDMK4vV-rUoV_XbpsJFjlMg6IL4lU3xBtpvv7ALlU2Db8KmXbjqo_nkYT5-DFoRhWAlZB0QIoXOVKo1oOOdfadMEW5lxEIrjZJUEcJZpKiiVrlYKJM0i0c0C4lxsZM19Bx18iK3FwgzSzk3hscGgkgtNIOiJcANmQhZJi9RD3Zn-d5gMpbNxlz9-fQaHTedg5DLuEGdutzaW3RkPupVVd75o_0GRxahjw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ1NT4MwGMcbnSZ6ms4Z3-3BK9toobTnxUUjThIXs9vSlqJLNjDAjH57-wCbMfHiDTgU0oY8778_QjfWrSfUGmonIIY5nuCJoxT3HJ_HgeepWMdVvuMlDMZjPp2KqOFsV7Mwxpiq-cz04LKq5ceZXkGqrE-Zbx3gbbQDwlnNsNYmoQLgM-ufVIxHlzP72oHbEHbW95s65UD0n6NwCKN6tFev-ktdpTIuo_a_PusAdX-G9HC0MT-HaMukHdReqzTg5qc9Qk-RzEEwZYGXgOP_xGXNMy8MlovXLJ-Xb8sCZymOgaEL8lcmxktowP3CNljWNcAJ62bhoosmo9vJ8M5pZBScORelQ4jgKpGxUgCPtxaeMkkCIzzmGqGloJKQgHmSSmqkjYYSQRN_QBOXaBs9GU2PUSvNUnOCMDM0CLQOfA1hpOKKQdkS8IaMuywRp6gDuzN7r0EZs3pjzv58eo327iaP4Sy8Hz-co_26jxAyGxeoVeYrc4l29Uc5L_Kr6pi_AeL4pNg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+Scalable+Parallel+Libraries+Conference+%2C+October+6-8%2C+1993%2C+Mississippi+State%2C+Mississippi&rft.atitle=Parallel+matrix+transpose+algorithms+on+distributed+memory+concurrent+computers&rft.au=Jaeyoung+Choi&rft.au=Dongarra%2C+J.J.&rft.au=Walker%2C+D.W.&rft.date=1993-01-01&rft.pub=IEEE+Comput.+Soc.+Press&rft.isbn=9780818649806&rft.spage=245&rft.epage=252&rft_id=info:doi/10.1109%2FSPLC.1993.365559&rft.externalDocID=365559 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/sc.gif&client=summon&freeimage=true |

