Parallel matrix transpose algorithms on distributed memory concurrent computers
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide app...
Saved in:
| Published in: | Proceedings of the Scalable Parallel Libraries Conference , October 6-8, 1993, Mississippi State, Mississippi pp. 245 - 252 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE Comput. Soc. Press
1993
|
| Subjects: | |
| ISBN: | 0818649801, 9780818649806 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.< > |
|---|---|
| AbstractList | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.< > |
| Author | Jaeyoung Choi Dongarra, J.J. Walker, D.W. |
| Author_xml | – sequence: 1 surname: Jaeyoung Choi fullname: Jaeyoung Choi organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA – sequence: 2 givenname: J.J. surname: Dongarra fullname: Dongarra, J.J. organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA – sequence: 3 givenname: D.W. surname: Walker fullname: Walker, D.W. organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA |
| BookMark | eNotj11LwzAYhQMq6Obuxav8gdZ8N7mU4RcUNnD34237VitNU5IM3L-3MM_Nc-CBA2dFrqcwISEPnJWcM_f0ua-3JXdOltJord0VWTHLrVHOMn5LNin9sCVKKab0HdntIcI44kg95Dj80hxhSnNISGH8CnHI3z7RMNFuSItvThk76tGHeKZtmNpTjDjlpfp5UTHdk5sexoSbf67J4fXlsH0v6t3bx_a5LgbrciGEs00PXdMoKyvBtTQgKnTKcHQtOAlCVEaBBIlghO6d7DWTPRctVw5buSaPl9kBEY9zHDzE8_HyWP4BGYhPmg |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/SPLC.1993.365559 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EndPage | 252 |
| ExternalDocumentID | 365559 |
| GroupedDBID | 6IE 6IK 6IL AAJGR AAWTH ACGHX ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK OCL RIB RIC RIE RIL |
| ID | FETCH-LOGICAL-i89t-2298bfadbb483721536a27e9461e9ca93a22764a3a3ea625f93f503f12c149ec3 |
| IEDL.DBID | RIE |
| ISBN | 0818649801 9780818649806 |
| IngestDate | Tue Aug 26 22:02:01 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i89t-2298bfadbb483721536a27e9461e9ca93a22764a3a3ea625f93f503f12c149ec3 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_365559 |
| PublicationCentury | 1900 |
| PublicationDate | 19930000 |
| PublicationDateYYYYMMDD | 1993-01-01 |
| PublicationDate_xml | – year: 1993 text: 19930000 |
| PublicationDecade | 1990 |
| PublicationTitle | Proceedings of the Scalable Parallel Libraries Conference , October 6-8, 1993, Mississippi State, Mississippi |
| PublicationTitleAbbrev | SPLC |
| PublicationYear | 1993 |
| Publisher | IEEE Comput. Soc. Press |
| Publisher_xml | – name: IEEE Comput. Soc. Press |
| SSID | ssj0000444045 |
| Score | 1.2370282 |
| Snippet | This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 245 |
| SubjectTerms | Application software Computer architecture Concurrent computing Distributed computing Laboratories Lifting equipment Linear algebra Matrix decomposition Packaging Scattering |
| Title | Parallel matrix transpose algorithms on distributed memory concurrent computers |
| URI | https://ieeexplore.ieee.org/document/365559 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PT8MgFMeJWzx4ms4Zf4eD124rUArnxcWDmU3cYbeF0ocu2VrTdov-9wLtZky8eGubQAiEPt7jvc8XoYex0CpUmQ4EBx0wRkygWBYFHKy9iyQXIIwXm4hnM7FYyKTlbPtaGADwyWcwdI_-Lj8r9NaFykbUNo9kB3XimDelWodwisOe2dOJJzyGgjNp_7wtX2f_frilHMvRa_I8cYV6dNj0-UtbxZuWae9fgzpFg58SPZwcjM8ZOoK8j3p7jQbcbtlz9JKo0smlrPHGwfg_cd3QzCvAav1WlKv6fVPhIseZI-g68SvI8Mal335h6yrrBt-EddtxNUDz6eN88hS0IgrBSsg6IESK1KgsTR063tp3yhWJQTIegtRKUkVIzJmiioKyvpCR1ERjakKire8Eml6gbl7kcIkwl4ylBOweN4al1Ci7xNaf0TTSUoIiV6jvZmf50WAyls3EXP_59QadNJmDLpZxi7p1uYU7dKx39aoq7_3SfgN1vqOL |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PT8IwFMcbRRM9oYjxtz14HbC2K-2ZSDAiLpEDN9J1r0oCm9mG0f_edhsYEy_etiVtmjbd63t97_NF6K4ntPJVrD3BQXuMEeMpFgceB2vvAskFCFOKTfQnEzGbybDmbJe1MABQJp9Bxz2Wd_lxqtcuVNaltnkgd9GeE86qi7W2ARUHPrPnk5Lx6AvOpP331oSdzfv2nrInuy_heOBK9Win6vWXukppXIbNfw3rCLV_ivRwuDU_x2gHkhZqblQacL1pT9BzqDInmLLEK4fj_8RFxTPPAavla5otirdVjtMEx46h6-SvIMYrl4D7ha2zrCuAE9Z1x3kbTYf308HIq2UUvIWQhUeIFJFRcRQ5eLy18JQr0gfJuA9SK0kVIX3OFFUUlPWGjKQm6FHjE229J9D0FDWSNIEzhLlkLCJgd7kxLKJG2UW2Ho2mgZYSFDlHLTc78_cKlDGvJubiz6-36GA0fRrPxw-Tx0t0WOURusjGFWoU2Rqu0b7-KBZ5dlMu8zcrCabU |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+Scalable+Parallel+Libraries+Conference+%2C+October+6-8%2C+1993%2C+Mississippi+State%2C+Mississippi&rft.atitle=Parallel+matrix+transpose+algorithms+on+distributed+memory+concurrent+computers&rft.au=Jaeyoung+Choi&rft.au=Dongarra%2C+J.J.&rft.au=Walker%2C+D.W.&rft.date=1993-01-01&rft.pub=IEEE+Comput.+Soc.+Press&rft.isbn=9780818649806&rft.spage=245&rft.epage=252&rft_id=info:doi/10.1109%2FSPLC.1993.365559&rft.externalDocID=365559 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/sc.gif&client=summon&freeimage=true |

