Parallel matrix transpose algorithms on distributed memory concurrent computers

This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide app...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the Scalable Parallel Libraries Conference , October 6-8, 1993, Mississippi State, Mississippi pp. 245 - 252
Main Authors:	Jaeyoung Choi, Dongarra, J.J., Walker, D.W.
Format:	Conference Proceeding
Language:	English
Published:	IEEE Comput. Soc. Press 1993
Subjects:	Application software Computer architecture Concurrent computing Distributed computing Laboratories Lifting equipment Linear algebra Matrix decomposition Packaging Scattering
ISBN:	0818649801, 9780818649806
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.< >
AbstractList	This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.< >
Author	Jaeyoung Choi Dongarra, J.J. Walker, D.W.
Author_xml	– sequence: 1 surname: Jaeyoung Choi fullname: Jaeyoung Choi organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA – sequence: 2 givenname: J.J. surname: Dongarra fullname: Dongarra, J.J. organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA – sequence: 3 givenname: D.W. surname: Walker fullname: Walker, D.W. organization: Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA
BookMark	eNotj11LwzAYhQMq6Obuxav8gdZ8N7mU4RcUNnD34237VitNU5IM3L-3MM_Nc-CBA2dFrqcwISEPnJWcM_f0ua-3JXdOltJord0VWTHLrVHOMn5LNin9sCVKKab0HdntIcI44kg95Dj80hxhSnNISGH8CnHI3z7RMNFuSItvThk76tGHeKZtmNpTjDjlpfp5UTHdk5sexoSbf67J4fXlsH0v6t3bx_a5LgbrciGEs00PXdMoKyvBtTQgKnTKcHQtOAlCVEaBBIlghO6d7DWTPRctVw5buSaPl9kBEY9zHDzE8_HyWP4BGYhPmg
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/SPLC.1993.365559
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EndPage	252
ExternalDocumentID	365559
GroupedDBID	6IE 6IK 6IL AAJGR AAWTH ACGHX ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK OCL RIB RIC RIE RIL
ID	FETCH-LOGICAL-i89t-2298bfadbb483721536a27e9461e9ca93a22764a3a3ea625f93f503f12c149ec3
IEDL.DBID	RIE
ISBN	0818649801 9780818649806
IngestDate	Tue Aug 26 22:02:01 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i89t-2298bfadbb483721536a27e9461e9ca93a22764a3a3ea625f93f503f12c149ec3
PageCount	8
ParticipantIDs	ieee_primary_365559
PublicationCentury	1900
PublicationDate	19930000
PublicationDateYYYYMMDD	1993-01-01
PublicationDate_xml	– year: 1993 text: 19930000
PublicationDecade	1990
PublicationTitle	Proceedings of the Scalable Parallel Libraries Conference , October 6-8, 1993, Mississippi State, Mississippi
PublicationTitleAbbrev	SPLC
PublicationYear	1993
Publisher	IEEE Comput. Soc. Press
Publisher_xml	– name: IEEE Comput. Soc. Press
SSID	ssj0000444045
Score	1.2370282
Snippet	This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl...
SourceID	ieee
SourceType	Publisher
StartPage	245
SubjectTerms	Application software Computer architecture Concurrent computing Distributed computing Laboratories Lifting equipment Linear algebra Matrix decomposition Packaging Scattering
Title	Parallel matrix transpose algorithms on distributed memory concurrent computers
URI	https://ieeexplore.ieee.org/document/365559
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PS8MwFMeDGx48TefE3-TgtVubtGlyHg4PYxYcstvIj1cdbK203dD_3qTtJoIXb20JoSRpX97Le58vQg-KE03tyvFESIwXgjKeiqPYs4tLkkBxKutQ9us0ns34YiGSlrNd18IAQJ18BkN3WZ_lm1xvXahsRFlkN8Ad1Ilj1pRqHcIpDntmdyc14THgLBT2z9vydfb3h1NKX4xekunYFerRYdPnL22V2rRMev96qVM0-CnRw8nB-JyhI8j6qLfXaMDtJ3uOnhNZOLmUNd44GP8nrhqaeQlYrt_yYlW9b0qcZ9g4gq4TvwKDNy799gtbV1k3-Cas247LAZpPHufjJ68VUfBWXFQeIYKrVBqlHDre2nfKJIlBhCwAoaWwM0JiFkoqKUjrC6WCppFP04Bo6zuBpheom-UZXCLMODWRr4SkPAqNCgUDn0vQEQeQtvEV6rvRWX40mIxlMzDXfz69QSdN5qCLZdyiblVs4Q4d6121Kov7emq_Ac-cooU
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PS8MwFMeDTkFP0znxtzl47dYm_ZGch0OxzoJDdhv58aaDrZW2E_3vTdpuInjx1pYQSpL25b289_kidCMZUdSsHIf7RDs-SO3IKIgcs7gE8SSjogplv8TRaMQmE540nO2qFgYAquQz6NnL6ixfZ2plQ2V9GgZmA7yNdqxwVlOstQmoWPCZ2Z9UjEePhT43_96GsLO-35xTurz_nMQDW6pHe3Wvv9RVKuMybP_rtQ5Q96dIDycb83OItiDtoPZapQE3H-0RekpEbgVTFnhpcfyfuKx55gVgsXjN8nn5tixwlmJtGbpW_go0XtoE3C9snGVVA5ywajouumg8vB0P7pxGRsGZM146hHAmZ0JLaeHxxsLTUJAIuB96wJXgZk5IFPqCCgrCeEMzTmeBS2ceUcZ7AkWPUSvNUjhBOGRUB67kgrLA19LnIbhMgAoYgDCNT1HHjs70vQZlTOuBOfvz6TXauxs_xtP4fvRwjvbrPEIb2bhArTJfwSXaVR_lvMivqmn-BndBpc4
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+Scalable+Parallel+Libraries+Conference+%2C+October+6-8%2C+1993%2C+Mississippi+State%2C+Mississippi&rft.atitle=Parallel+matrix+transpose+algorithms+on+distributed+memory+concurrent+computers&rft.au=Jaeyoung+Choi&rft.au=Dongarra%2C+J.J.&rft.au=Walker%2C+D.W.&rft.date=1993-01-01&rft.pub=IEEE+Comput.+Soc.+Press&rft.isbn=9780818649806&rft.spage=245&rft.epage=252&rft_id=info:doi/10.1109%2FSPLC.1993.365559&rft.externalDocID=365559
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/lc.gif&client=summon&freeimage=true
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/mc.gif&client=summon&freeimage=true
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818649806/sc.gif&client=summon&freeimage=true