A new parallel matrix multiplication algorithm on distributed-memory concurrent computers
We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution‐independent matrix multiplication algorithm) for block cyclic data distribution on distributed‐memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communicatio...
Uložené v:
| Vydané v: | Concurrency (Chichester, England.) Ročník 10; číslo 8; s. 655 - 670 |
|---|---|
| Hlavný autor: | |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Chichester
John Wiley & Sons, Ltd
01.07.1998
|
| ISSN: | 1040-3108, 1096-9128 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution‐independent matrix multiplication algorithm) for block cyclic data distribution on distributed‐memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS (basic linear algebra subprograms) routine in each processor even when the block size is very small or very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer. © 1998 John Wiley & Sons, Ltd. |
|---|---|
| Bibliografia: | ark:/67375/WNG-M4QSB26M-M ArticleID:CPE369 Korean Ministry of Information and Communication - No. 96087-IT1-I2 istex:A9FCF3330AFF11591E0D4FF67C558D88C7365C82 ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
| ISSN: | 1040-3108 1096-9128 |
| DOI: | 10.1002/(SICI)1096-9128(199807)10:8<655::AID-CPE369>3.0.CO;2-O |