A new parallel matrix multiplication algorithm on distributed-memory concurrent computers

We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution‐independent matrix multiplication algorithm) for block cyclic data distribution on distributed‐memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communicatio...

Full description

Saved in:
Bibliographic Details
Published in:Concurrency (Chichester, England.) Vol. 10; no. 8; pp. 655 - 670
Main Author: Choi, Jaeyoung
Format: Journal Article
Language:English
Published: Chichester John Wiley & Sons, Ltd 01.07.1998
ISSN:1040-3108, 1096-9128
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution‐independent matrix multiplication algorithm) for block cyclic data distribution on distributed‐memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS (basic linear algebra subprograms) routine in each processor even when the block size is very small or very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer. © 1998 John Wiley & Sons, Ltd.
Bibliography:ark:/67375/WNG-M4QSB26M-M
ArticleID:CPE369
Korean Ministry of Information and Communication - No. 96087-IT1-I2
istex:A9FCF3330AFF11591E0D4FF67C558D88C7365C82
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1040-3108
1096-9128
DOI:10.1002/(SICI)1096-9128(199807)10:8<655::AID-CPE369>3.0.CO;2-O