Parallel matrix transpose algorithms on distributed memory concurrent computers
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide app...
Uloženo v:
| Vydáno v: | Proceedings of the Scalable Parallel Libraries Conference , October 6-8, 1993, Mississippi State, Mississippi s. 245 - 252 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE Comput. Soc. Press
1993
|
| Témata: | |
| ISBN: | 0818649801, 9780818649806 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.< > |
|---|---|
| AbstractList | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.< > |
| Author | Jaeyoung Choi Dongarra, J.J. Walker, D.W. |
| Author_xml | – sequence: 1 surname: Jaeyoung Choi fullname: Jaeyoung Choi organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA – sequence: 2 givenname: J.J. surname: Dongarra fullname: Dongarra, J.J. organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA – sequence: 3 givenname: D.W. surname: Walker fullname: Walker, D.W. organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA |
| BookMark | eNotj11LwzAYhQMq6Obuxav8gdZ8N7mU4RcUNnD34237VitNU5IM3L-3MM_Nc-CBA2dFrqcwISEPnJWcM_f0ua-3JXdOltJord0VWTHLrVHOMn5LNin9sCVKKab0HdntIcI44kg95Dj80hxhSnNISGH8CnHI3z7RMNFuSItvThk76tGHeKZtmNpTjDjlpfp5UTHdk5sexoSbf67J4fXlsH0v6t3bx_a5LgbrciGEs00PXdMoKyvBtTQgKnTKcHQtOAlCVEaBBIlghO6d7DWTPRctVw5buSaPl9kBEY9zHDzE8_HyWP4BGYhPmg |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/SPLC.1993.365559 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EndPage | 252 |
| ExternalDocumentID | 365559 |
| GroupedDBID | 6IE 6IK 6IL AAJGR AAWTH ACGHX ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK OCL RIB RIC RIE RIL |
| ID | FETCH-LOGICAL-i89t-2298bfadbb483721536a27e9461e9ca93a22764a3a3ea625f93f503f12c149ec3 |
| IEDL.DBID | RIE |
| ISBN | 0818649801 9780818649806 |
| IngestDate | Tue Aug 26 22:02:01 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i89t-2298bfadbb483721536a27e9461e9ca93a22764a3a3ea625f93f503f12c149ec3 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_365559 |
| PublicationCentury | 1900 |
| PublicationDate | 19930000 |
| PublicationDateYYYYMMDD | 1993-01-01 |
| PublicationDate_xml | – year: 1993 text: 19930000 |
| PublicationDecade | 1990 |
| PublicationTitle | Proceedings of the Scalable Parallel Libraries Conference , October 6-8, 1993, Mississippi State, Mississippi |
| PublicationTitleAbbrev | SPLC |
| PublicationYear | 1993 |
| Publisher | IEEE Comput. Soc. Press |
| Publisher_xml | – name: IEEE Comput. Soc. Press |
| SSID | ssj0000444045 |
| Score | 1.2370473 |
| Snippet | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 245 |
| SubjectTerms | Application software Computer architecture Concurrent computing Distributed computing Laboratories Lifting equipment Linear algebra Matrix decomposition Packaging Scattering |
| Title | Parallel matrix transpose algorithms on distributed memory concurrent computers |
| URI | https://ieeexplore.ieee.org/document/365559 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ07T8MwEMctWjEwFUoRb3lgddvYaRzPFRVDVSLRoVvl2Beo1CYoSRF8e_xIg5BY2JIMp8hWcg_f_3cIPcRU0XGsKdEmNiYh0xlJVSqJ4hQk1WA8qNvpOV8s4tVKJA1n22lhAMA1n8HQXrqzfF2ovS2VjVg0MQFwB3U4j7xUqy2nWOyZiU4c4TGIo1CYP2_D1znct6eUYzF6SeZTK9RjQ2_z12wV51pmvX-91Cka_Ej0cNI6nzN0BHkf9Q4zGnDzyZ6j50SWdlzKFu8sjP8T155mXgGW29ei3NRvuwoXOdaWoGuHX4HGO9t--4VNqqw8vgmrxnA1QMvZ43L6RJohCmQTi5pQKuI0kzpNLTre-HcWScpBhFEAQknBJKU8CiWTDKTJhTLBssmYZQFVJncCxS5QNy9yuETY5OAmlpEAzKpXUyonSvMgkGBsW4zfFerb1Vm_e0zG2i_M9Z9Pb9CJ7xy0tYxb1K3LPdyhY_VRb6ry3m3tN6Pvo3E |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ07T8MwEMdPUJBgKpQi3nhgTdvYeXmuqIoIJRIdulWOfYFKbYKSFMG3x07SIiQWtiTDKbKV3MP3_x3AXUAlHQSKWkrHxpbDVGLFMhaW9CkKqlB70GqnQ38yCWYzHjWc7UoLg4hV8xn2zGV1lq8yuTalsj7zXB0A78Ke6zh0UIu1tgUVAz7T8UnFeLQDz-H639sQdjb323PKAe-_ROHQSPVYr7b6a7pK5VxG7X-91hF0f0R6JNq6n2PYwbQD7c2UBtJ8tCfwHIncDExZkpXB8X-SsuaZF0jE8jXLF-XbqiBZSpRh6JrxV6jIyjTgfhGdLMsa4ERkY7jownR0Px2OrWaMgrUIeGlRyoM4ESqODTxee3jmCeojdzwbuRScCUp9zxFMMBQ6G0o4S9wBS2wqdfaEkp1CK81SPAOis3AdzQhEZvSrMRWuVL5tC9S2DcjvHDpmdebvNShjXi_MxZ9Pb-FgPH0K5-HD5PESDus-QlPZuIJWma_xGvblR7ko8ptqm78BVGemuA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+Scalable+Parallel+Libraries+Conference+%2C+October+6-8%2C+1993%2C+Mississippi+State%2C+Mississippi&rft.atitle=Parallel+matrix+transpose+algorithms+on+distributed+memory+concurrent+computers&rft.au=Jaeyoung+Choi&rft.au=Dongarra%2C+J.J.&rft.au=Walker%2C+D.W.&rft.date=1993-01-01&rft.pub=IEEE+Comput.+Soc.+Press&rft.isbn=9780818649806&rft.spage=245&rft.epage=252&rft_id=info:doi/10.1109%2FSPLC.1993.365559&rft.externalDocID=365559 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/sc.gif&client=summon&freeimage=true |

