Parallel matrix transpose algorithms on distributed memory concurrent computers

This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide app...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Proceedings of the Scalable Parallel Libraries Conference , October 6-8, 1993, Mississippi State, Mississippi s. 245 - 252
Hlavní autori:	Jaeyoung Choi, Dongarra, J.J., Walker, D.W.
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE Comput. Soc. Press 1993
Predmet:	Application software Computer architecture Concurrent computing Distributed computing Laboratories Lifting equipment Linear algebra Matrix decomposition Packaging Scattering
ISBN:	0818649801, 9780818649806
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.< >
ISBN:	0818649801 9780818649806
DOI:	10.1109/SPLC.1993.365559