A new parallel matrix multiplication algorithm on distributed-memory concurrent computers

We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution‐independent matrix multiplication algorithm) for block cyclic data distribution on distributed‐memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communicatio...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Concurrency (Chichester, England.) Ročník 10; číslo 8; s. 655 - 670
Hlavný autor: Choi, Jaeyoung
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Chichester John Wiley & Sons, Ltd 01.07.1998
ISSN:1040-3108, 1096-9128
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution‐independent matrix multiplication algorithm) for block cyclic data distribution on distributed‐memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS (basic linear algebra subprograms) routine in each processor even when the block size is very small or very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer. © 1998 John Wiley & Sons, Ltd.
Bibliografia:ark:/67375/WNG-M4QSB26M-M
ArticleID:CPE369
Korean Ministry of Information and Communication - No. 96087-IT1-I2
istex:A9FCF3330AFF11591E0D4FF67C558D88C7365C82
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1040-3108
1096-9128
DOI:10.1002/(SICI)1096-9128(199807)10:8<655::AID-CPE369>3.0.CO;2-O