A technique for overlapping computation and communication for block recursive algorithms

This paper presents a design methodology for developing efficient distributed‐memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed‐memory a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Concurrency (Chichester, England.) Jg. 10; H. 2; S. 73 - 90
Hauptverfasser:	GUPTA, S. K. S., HUANG, C.-H., SADAYAPPAN, P., JOHNSON, R. W.
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Chichester John Wiley & Sons, Ltd 01.02.1998
ISSN:	1040-3108, 1096-9128
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents a design methodology for developing efficient distributed‐memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed‐memory architecture with a circuit‐switched or wormhole routed mesh or a hypercube interconnection network. A mathematical framework based on the tensor product and other matrix operations is used for representing algorithms. Communication‐efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tensor product algebra. Performance results for FFT programs on the Intel Paragon are presented. © 1998 John Wiley & Sons, Ltd.
Bibliographie:	ArticleID:CPE289 DARPA - No. 60NANB1D1151; No. 60NANB1D1150 istex:74A1774F73033FF7E93F5C62DCA9342DC9B20F57 ark:/67375/WNG-PCPQ8PRJ-9
ISSN:	1040-3108 1096-9128
DOI:	10.1002/(SICI)1096-9128(199802)10:2<73::AID-CPE289>3.0.CO;2-N