Parallel matrix transpose algorithms on distributed memory concurrent computers
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide app...
Saved in:
| Published in: | Proceedings of the Scalable Parallel Libraries Conference , October 6-8, 1993, Mississippi State, Mississippi pp. 245 - 252 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE Comput. Soc. Press
1993
|
| Subjects: | |
| ISBN: | 0818649801, 9780818649806 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.< > |
|---|---|
| AbstractList | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.< > |
| Author | Jaeyoung Choi Dongarra, J.J. Walker, D.W. |
| Author_xml | – sequence: 1 surname: Jaeyoung Choi fullname: Jaeyoung Choi organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA – sequence: 2 givenname: J.J. surname: Dongarra fullname: Dongarra, J.J. organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA – sequence: 3 givenname: D.W. surname: Walker fullname: Walker, D.W. organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA |
| BookMark | eNotj11LwzAYhQMq6Obuxav8gdZ8N7mU4RcUNnD34237VitNU5IM3L-3MM_Nc-CBA2dFrqcwISEPnJWcM_f0ua-3JXdOltJord0VWTHLrVHOMn5LNin9sCVKKab0HdntIcI44kg95Dj80hxhSnNISGH8CnHI3z7RMNFuSItvThk76tGHeKZtmNpTjDjlpfp5UTHdk5sexoSbf67J4fXlsH0v6t3bx_a5LgbrciGEs00PXdMoKyvBtTQgKnTKcHQtOAlCVEaBBIlghO6d7DWTPRctVw5buSaPl9kBEY9zHDzE8_HyWP4BGYhPmg |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/SPLC.1993.365559 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EndPage | 252 |
| ExternalDocumentID | 365559 |
| GroupedDBID | 6IE 6IK 6IL AAJGR AAWTH ACGHX ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK OCL RIB RIC RIE RIL |
| ID | FETCH-LOGICAL-i89t-2298bfadbb483721536a27e9461e9ca93a22764a3a3ea625f93f503f12c149ec3 |
| IEDL.DBID | RIE |
| ISBN | 0818649801 9780818649806 |
| IngestDate | Tue Aug 26 22:02:01 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i89t-2298bfadbb483721536a27e9461e9ca93a22764a3a3ea625f93f503f12c149ec3 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_365559 |
| PublicationCentury | 1900 |
| PublicationDate | 19930000 |
| PublicationDateYYYYMMDD | 1993-01-01 |
| PublicationDate_xml | – year: 1993 text: 19930000 |
| PublicationDecade | 1990 |
| PublicationTitle | Proceedings of the Scalable Parallel Libraries Conference , October 6-8, 1993, Mississippi State, Mississippi |
| PublicationTitleAbbrev | SPLC |
| PublicationYear | 1993 |
| Publisher | IEEE Comput. Soc. Press |
| Publisher_xml | – name: IEEE Comput. Soc. Press |
| SSID | ssj0000444045 |
| Score | 1.2370282 |
| Snippet | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 245 |
| SubjectTerms | Application software Computer architecture Concurrent computing Distributed computing Laboratories Lifting equipment Linear algebra Matrix decomposition Packaging Scattering |
| Title | Parallel matrix transpose algorithms on distributed memory concurrent computers |
| URI | https://ieeexplore.ieee.org/document/365559 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PS8MwFMeDGx48TefE3-TgtVubtGlyHg4PYxYcstvIj1cdbK203dD_3qTtJoIXb20JoSRpX97Le58vQg-KE03tyvFESIwXgjKeiqPYs4tLkkBxKutQ9us0ns34YiGSlrNd18IAQJ18BkN3WZ_lm1xvXahsRFlkN8Ad1Ilj1pRqHcIpDntmdyc14THgLBT2z9vydfb3h1NKX4xekunYFerRYdPnL22V2rRMev96qVM0-CnRw8nB-JyhI8j6qLfXaMDtJ3uOnhNZOLmUNd44GP8nrhqaeQlYrt_yYlW9b0qcZ9g4gq4TvwKDNy799gtbV1k3-Cas247LAZpPHufjJ68VUfBWXFQeIYKrVBqlHDre2nfKJIlBhCwAoaWwM0JiFkoqKUjrC6WCppFP04Bo6zuBpheom-UZXCLMODWRr4SkPAqNCgUDn0vQEQeQtvEV6rvRWX40mIxlMzDXfz69QSdN5qCLZdyiblVs4Q4d6121Kov7emq_Ac-cooU |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PS8MwFMeDTkFP0znxtzl47dYm_ZGch0OxzoJDdhv58aaDrZW2E_3vTdpuInjx1pYQSpL25b289_kidCMZUdSsHIf7RDs-SO3IKIgcs7gE8SSjogplv8TRaMQmE540nO2qFgYAquQz6NnL6ixfZ2plQ2V9GgZmA7yNdqxwVlOstQmoWPCZ2Z9UjEePhT43_96GsLO-35xTurz_nMQDW6pHe3Wvv9RVKuMybP_rtQ5Q96dIDycb83OItiDtoPZapQE3H-0RekpEbgVTFnhpcfyfuKx55gVgsXjN8nn5tixwlmJtGbpW_go0XtoE3C9snGVVA5ywajouumg8vB0P7pxGRsGZM146hHAmZ0JLaeHxxsLTUJAIuB96wJXgZk5IFPqCCgrCeEMzTmeBS2ceUcZ7AkWPUSvNUjhBOGRUB67kgrLA19LnIbhMgAoYgDCNT1HHjs70vQZlTOuBOfvz6TXauxs_xtP4fvRwjvbrPEIb2bhArTJfwSXaVR_lvMivqmn-BndBpc4 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+Scalable+Parallel+Libraries+Conference+%2C+October+6-8%2C+1993%2C+Mississippi+State%2C+Mississippi&rft.atitle=Parallel+matrix+transpose+algorithms+on+distributed+memory+concurrent+computers&rft.au=Jaeyoung+Choi&rft.au=Dongarra%2C+J.J.&rft.au=Walker%2C+D.W.&rft.date=1993-01-01&rft.pub=IEEE+Comput.+Soc.+Press&rft.isbn=9780818649806&rft.spage=245&rft.epage=252&rft_id=info:doi/10.1109%2FSPLC.1993.365559&rft.externalDocID=365559 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/sc.gif&client=summon&freeimage=true |

