Parallel matrix transpose algorithms on distributed memory concurrent computers

This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P × Q processor template with a block cyclic data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Parallel computing Jg. 21; H. 9; S. 1387 - 1405
Hauptverfasser:	Choi, Jaeyoung, Dongarra, Jack J., Walker, David W.
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Amsterdam Elsevier B.V 01.09.1995 Elsevier
Schlagworte:	Algorithmics. Computability. Computer arithmetics Applied sciences Computer science; control theory; systems Computer systems and distributed systems. User interface Distributed memory multiprocessors Exact sciences and technology Intel Touchstone Delta Linear algebra Matrix transpose algorithm Memory and file management (including protection and security) Memory organisation. Data processing Point-to-point communication Software Theoretical computing Distributed memory multiprocessors Matrix transpose algorithm Intel Touchstone Delta Point-to-point communication Linear algebra Matrix diagonalization Parallel algorithm Distributed memory multiprocessor system Matrix inversion Distributed algorithm Matrix calculus Matrix product Parallelism Distributed system Implementation Point to point communication
ISSN:	0167-8191, 1872-7336
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P × Q processor template with a block cyclic data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The communication schemes of the algorithms are determined by the greatest common divisor ( GCD) of P and Q. If P and Q are relatively prime, the matrix transpose algorithm involves complete exchange communication. If P and Q are not relatively prime, processors are divided into GCD groups and the communication operations are overlapped for different groups of processors. Processors transpose GCD wrapped diagonal blocks simultaneously, and the matrix can be transposed with LCM/GCD steps, where LCM is the least common multiple of P and Q. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C = A · B, the algorithms are used to compute parallel multiplications of transposed matrices, C = A T · B T , in the PUMMA package [5]. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.
AbstractList	This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P × Q processor template with a block cyclic data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The communication schemes of the algorithms are determined by the greatest common divisor ( GCD) of P and Q. If P and Q are relatively prime, the matrix transpose algorithm involves complete exchange communication. If P and Q are not relatively prime, processors are divided into GCD groups and the communication operations are overlapped for different groups of processors. Processors transpose GCD wrapped diagonal blocks simultaneously, and the matrix can be transposed with LCM/GCD steps, where LCM is the least common multiple of P and Q. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C = A · B, the algorithms are used to compute parallel multiplications of transposed matrices, C = A T · B T , in the PUMMA package [5]. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.
Author	Choi, Jaeyoung Walker, David W. Dongarra, Jack J.
Author_xml	– sequence: 1 givenname: Jaeyoung surname: Choi fullname: Choi, Jaeyoung email: choi@msr.epm.ornl.gov organization: School of Computing, Soongsil University, 1-1 Sangdo-Dong, Dongjak-Ku, Seoul 156-743, South Korea – sequence: 2 givenname: Jack J. surname: Dongarra fullname: Dongarra, Jack J. organization: Mathematical Sciences Section, Oak Ridge National Laboratory, P.O. Box 2008, Bldg. 6012, Oak Ridge, TN 37831-6367, USA – sequence: 3 givenname: David W. surname: Walker fullname: Walker, David W. organization: Mathematical Sciences Section, Oak Ridge National Laboratory, P.O. Box 2008, Bldg. 6012, Oak Ridge, TN 37831-6367, USA
BackLink	http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=3683593$$DView record in Pascal Francis
BookMark	eNqFkEFPwyAUx4mZidv0G3jowYMeqrRQSj2YmEWdyZJ50DN5BaqYFhZgxn17mTM7eNADgZf3-788fhM0ss5qhE4LfFnggl2lU-e8aIrzprrAOJX5_ACNC16XeU0IG6HxHjlCkxDeE8Qox2O0fAIPfa_7bIDozWcWPdiwckFn0L86b-LbEDJnM2VC6rfrqFU26MH5TSadlWvvtY3pOaxSy4djdNhBH_TJzz1FL_d3z7N5vlg-PM5uF7kkhMQcylLRWrVQQUsKUEwzSXlDOWANnFDAsmtJLSvaVFTxkijKGIeGtWVLoAYyRWe7uSsIEvoubS1NECtvBvAbQRgnVUMSRneY9C4Er7s9UWCxdSe2YsRWjGgq8e1OzFPs-ldMmgjROJv0mP6_8M0urNP_P4z2IkijrdTKeC2jUM78PeALnCGL4w
CODEN	PACOEJ
CitedBy_id	crossref_primary_10_1016_j_micpro_2018_09_002 crossref_primary_10_1016_j_jocs_2023_101945 crossref_primary_10_1002_cpe_639 crossref_primary_10_1016_j_parco_2019_102597 crossref_primary_10_1109_TPDS_2021_3131657 crossref_primary_10_1177_10943420231205601 crossref_primary_10_1007_s10766_017_0515_0 crossref_primary_10_1016_j_parco_2009_01_003 crossref_primary_10_1016_j_parco_2020_102624
Cites_doi	10.1137/0609037 10.1002/cpe.4330060702 10.1109/T-C.1972.223584 10.1109/TC.1987.5009457
ContentType	Journal Article
Copyright	1995 1995 INIST-CNRS
Copyright_xml	– notice: 1995 – notice: 1995 INIST-CNRS
DBID	AAYXX CITATION IQODW
DOI	10.1016/0167-8191(95)00016-H
DatabaseName	CrossRef Pascal-Francis
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science Applied Sciences
EISSN	1872-7336
EndPage	1405
ExternalDocumentID	3683593 10_1016_0167_8191_95_00016_H 016781919500016H
GroupedDBID	--K --M -~X .DC .~1 0R~ 123 1B1 1~. 1~5 29O 4.4 457 4G. 5VS 6OB 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABEFU ABFNM ABJNI ABMAC ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM LG9 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SCC SDF SDG SDP SES SEW SPC SPCBC SST SSV SSZ T5K WH7 WUQ XPP ZMT ~G- 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD AFXIZ AGCQF AGRNS BNPGV IQODW SSH
ID	FETCH-LOGICAL-c333t-a22d47dba5ab31ad6e6c48948a0ea834a0cfb37c54954d823d4668a96b2b3a7a3
ISICitedReferencesCount	24
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=016781919500016H&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	0167-8191
IngestDate	Mon Jul 21 09:17:05 EDT 2025 Tue Nov 18 21:26:31 EST 2025 Sat Nov 29 03:58:55 EST 2025 Fri Feb 23 02:30:42 EST 2024
IsPeerReviewed	true
IsScholarly	true
Issue	9
Keywords	Distributed memory multiprocessors Matrix transpose algorithm Intel Touchstone Delta Point-to-point communication Linear algebra Matrix diagonalization Parallel algorithm Distributed memory multiprocessor system Matrix inversion Distributed algorithm Matrix calculus Matrix product Parallelism Distributed system Implementation Point to point communication
Language	English
License	https://www.elsevier.com/tdm/userlicense/1.0 CC BY 4.0
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c333t-a22d47dba5ab31ad6e6c48948a0ea834a0cfb37c54954d823d4668a96b2b3a7a3
PageCount	19
ParticipantIDs	pascalfrancis_primary_3683593 crossref_primary_10_1016_0167_8191_95_00016_H crossref_citationtrail_10_1016_0167_8191_95_00016_H elsevier_sciencedirect_doi_10_1016_0167_8191_95_00016_H
PublicationCentury	1900
PublicationDate	1995-09-01
PublicationDateYYYYMMDD	1995-09-01
PublicationDate_xml	– month: 09 year: 1995 text: 1995-09-01 day: 01
PublicationDecade	1990
PublicationPlace	Amsterdam
PublicationPlace_xml	– name: Amsterdam
PublicationTitle	Parallel computing
PublicationYear	1995
Publisher	Elsevier B.V Elsevier
Publisher_xml	– name: Elsevier B.V – name: Elsevier
References	Azari, Bojanczyk, Lee (BIB1) 1988 Dongarra, van de Geijn, Walker (BIB6) 1992 Choi, Dongarra, Walker (BIB4) 1992 O'Leary (BIB12) 1987; 36 Johnsson, Ho (BIB10) 1988; 9 Littlefield (BIB11) 1992 Bokhari, Berryman (BIB2) 1992 Golub, Van Loan (BIB8) 1989 Intel Corporation (BIB9) 1991 Strang (BIB13) 1988 Takkella, Seidel (BIB14) 1994 Choi, Dongarra, Pozo, Walker (BIB3) 1992 Choi, Dongarra, Walker (BIB5) 1994; 6 Eklundh (BIB7) 1972; 21 Choi (10.1016/0167-8191(95)00016-H_BIB5) 1994; 6 Eklundh (10.1016/0167-8191(95)00016-H_BIB7) 1972; 21 Choi (10.1016/0167-8191(95)00016-H_BIB4) 1992 O'Leary (10.1016/0167-8191(95)00016-H_BIB12) 1987; 36 Azari (10.1016/0167-8191(95)00016-H_BIB1) 1988 Bokhari (10.1016/0167-8191(95)00016-H_BIB2) 1992 Choi (10.1016/0167-8191(95)00016-H_BIB3) 1992 Golub (10.1016/0167-8191(95)00016-H_BIB8) 1989 Johnsson (10.1016/0167-8191(95)00016-H_BIB10) 1988; 9 Intel Corporation (10.1016/0167-8191(95)00016-H_BIB9) 1991 Littlefield (10.1016/0167-8191(95)00016-H_BIB11) 1992 Dongarra (10.1016/0167-8191(95)00016-H_BIB6) 1992 Strang (10.1016/0167-8191(95)00016-H_BIB13) 1988 Takkella (10.1016/0167-8191(95)00016-H_BIB14) 1994
References_xml	– year: 1991 ident: BIB9 publication-title: Touchstone Delta Fortran Calls Reference Manual – volume: 6 start-page: 543 year: 1994 end-page: 570 ident: BIB5 article-title: PUMMA: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers publication-title: Concurrency: Practice and Experience – volume: 36 start-page: 117 year: 1987 end-page: 122 ident: BIB12 article-title: Systolic arrays for matrix transpose and other reorderings publication-title: IEEE Trans. Comput. – start-page: 277 year: 1988 end-page: 288 ident: BIB1 article-title: Synchronous and asynchronous algorithms for matrix transposition on MCAP publication-title: SPIE Vol. 975, Advanced Algorithms and Architecture for Signal Processing III – volume: 9 start-page: 419 year: 1988 end-page: 454 ident: BIB10 article-title: Algorithms for matrix transposition on boolean n-cube configured ensemble architecture publication-title: SIAM J. Matrix Anal. Appl. – start-page: 372 year: 1992 end-page: 379 ident: BIB6 article-title: A look at scalable linear algebra libraries publication-title: Proc. 1992 Scalable High Performance Computing Conf. – year: 1989 ident: BIB8 publication-title: Matrix Computations – volume: 21 start-page: 801 year: 1972 end-page: 803 ident: BIB7 article-title: A fast computer method for matrix transposing publication-title: IEEE Trans. Comput. – start-page: 422 year: 1994 end-page: 428 ident: BIB14 article-title: Complete exchange and broadcast algorithms for meshes publication-title: Proc. Scalable High Performance Computing Conf. – start-page: 300 year: 1992 end-page: 306 ident: BIB2 article-title: Complete exchange on a circuit switched mesh publication-title: Proc. Scalable High Performance Computing Conf. – year: 1992 ident: BIB3 article-title: ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers publication-title: Proc. Fourth Symp. on the Frontiers of Massively Parallel Computation (McLean, Virginia) – year: 1992 ident: BIB4 article-title: The design of scalable software libraries for distributed memory concurrent computers publication-title: Proc. Environment and Tools for Parallel Scientific Computing Workshop (Saint Hilaire du Touvet, France) – start-page: 179 year: 1992 end-page: 190 ident: BIB11 article-title: Characterizing and tuning communications performance for real applications publication-title: Proc. First Intel Delta Application Workshop, CCSF-14-92 – year: 1988 ident: BIB13 publication-title: Linear Algebra and Its Applications – year: 1992 ident: 10.1016/0167-8191(95)00016-H_BIB4 article-title: The design of scalable software libraries for distributed memory concurrent computers – volume: 9 start-page: 419 year: 1988 ident: 10.1016/0167-8191(95)00016-H_BIB10 article-title: Algorithms for matrix transposition on boolean n-cube configured ensemble architecture publication-title: SIAM J. Matrix Anal. Appl. doi: 10.1137/0609037 – year: 1989 ident: 10.1016/0167-8191(95)00016-H_BIB8 – volume: 6 start-page: 543 year: 1994 ident: 10.1016/0167-8191(95)00016-H_BIB5 article-title: PUMMA: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers publication-title: Concurrency: Practice and Experience doi: 10.1002/cpe.4330060702 – year: 1988 ident: 10.1016/0167-8191(95)00016-H_BIB13 – start-page: 422 year: 1994 ident: 10.1016/0167-8191(95)00016-H_BIB14 article-title: Complete exchange and broadcast algorithms for meshes – volume: 21 start-page: 801 year: 1972 ident: 10.1016/0167-8191(95)00016-H_BIB7 article-title: A fast computer method for matrix transposing publication-title: IEEE Trans. Comput. doi: 10.1109/T-C.1972.223584 – start-page: 372 year: 1992 ident: 10.1016/0167-8191(95)00016-H_BIB6 article-title: A look at scalable linear algebra libraries – year: 1991 ident: 10.1016/0167-8191(95)00016-H_BIB9 publication-title: Touchstone Delta Fortran Calls Reference Manual – volume: 36 start-page: 117 year: 1987 ident: 10.1016/0167-8191(95)00016-H_BIB12 article-title: Systolic arrays for matrix transpose and other reorderings publication-title: IEEE Trans. Comput. doi: 10.1109/TC.1987.5009457 – year: 1992 ident: 10.1016/0167-8191(95)00016-H_BIB3 article-title: ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers – start-page: 179 year: 1992 ident: 10.1016/0167-8191(95)00016-H_BIB11 article-title: Characterizing and tuning communications performance for real applications – start-page: 277 year: 1988 ident: 10.1016/0167-8191(95)00016-H_BIB1 article-title: Synchronous and asynchronous algorithms for matrix transposition on MCAP – start-page: 300 year: 1992 ident: 10.1016/0167-8191(95)00016-H_BIB2 article-title: Complete exchange on a circuit switched mesh
SSID	ssj0006480
Score	1.533283
Snippet	This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P × Q...
SourceID	pascalfrancis crossref elsevier
SourceType	Index Database Enrichment Source Publisher
StartPage	1387
SubjectTerms	Algorithmics. Computability. Computer arithmetics Applied sciences Computer science; control theory; systems Computer systems and distributed systems. User interface Distributed memory multiprocessors Exact sciences and technology Intel Touchstone Delta Linear algebra Matrix transpose algorithm Memory and file management (including protection and security) Memory organisation. Data processing Point-to-point communication Software Theoretical computing
Title	Parallel matrix transpose algorithms on distributed memory concurrent computers
URI	https://dx.doi.org/10.1016/0167-8191(95)00016-H
Volume	21
WOSCitedRecordID	wos016781919500016H&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-7336 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0006480 issn: 0167-8191 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFLag4wEJcUdsMOQHkEBTtiZ2nPgxYoUCU4umDbqnyLETNqlNq6ag8u85ji8NCFR44CWqnMaWfD6fm88FoecqjCWvuAoEVyygqowCEcZVIEDd5VSqMGorMX06SUajdDLh7ka3adsJJHWdrtd88V9JDWNAbJ06-w_k9pPCAPwGosMTyA7PvyL8R7HU_VGmBzNdfX-tm0Do8uVNeSCmX-bLq9XlrL0iULpkru52BSrnTMfbftch6NLWa5K220PT1V791Oatk3o6ACe7GFyMz7Xnazj2jtjj8ehtdnqatVA53NxAfc5OPpgIjONDE-FnHA8mkZt3HQ8uI6brnASmqw3ALnc1-c8WRbzDKkNiJa0Ru2Doxb9l6ca74OcGvZvHLyLeKqvBcCPG3NX9L9LNxxwSBsomJ9fRTpTEPO2hnezdYPLei21G2zZ7fiGXZxmyIz_2ksev7MJ_0mNuLUQDp6sybVE6usrZXXTbGhk4M-C4h66V9X10xxoc2LLzBoZcTw839gCNHY2xgQ_28MEb-OB5jTvwwQY-eAMf7OHzEJ2_GZy9Hga26UYgCSErOKSRookqRCwKEgrFSiZpymkq-qVICRV9WRUkkTFY1lSlEVGUsVRwVkQFEYkgj1CvntflY4SBv5dhKkEHLFMwrIUAcVJIUgmaVFFS9XcRcVuYS1uRXjdGmeYu9FBvfK43PudxGybB8uEuCvxXC1ORZcv_E0ed3GqVRlvMAWFbvtz_iZh-OQulvS3vn6Cbm3PzFPVWy6_lProhv62umuUzi78fpryblQ
linkProvider	Elsevier
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parallel+matrix+transpose+algorithms+on+distributed+memory+concurrent+computers&rft.jtitle=Parallel+computing&rft.au=JAYEYOUNG+CHOI&rft.au=DONGARRA%2C+J.+J&rft.au=WALKER%2C+D.+W&rft.date=1995-09-01&rft.pub=Elsevier&rft.issn=0167-8191&rft.volume=21&rft.issue=9&rft.spage=1387&rft.epage=1405&rft_id=info:doi/10.1016%2F0167-8191%2895%2900016-H&rft.externalDBID=n%2Fa&rft.externalDocID=3683593
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-8191&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-8191&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-8191&client=summon