Time stamp algorithms for runtime parallelization of DOACROSS loops with dynamic dependences

This paper presents a time stamp algorithm for runtime parallelization of general DOACROSS loops that have indirect access patterns. The algorithm follows the INSPECTOR/EXECUTOR scheme and exploits parallelism at a fine-grained memory reference level. It features a parallel inspector and improves up...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE transactions on parallel and distributed systems Ročník 12; číslo 5; s. 433 - 450
Hlavní autori:	Xu, C.-Z., Chaudhary, V.
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	New York IEEE 01.05.2001 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	Algorithm design and analysis Algorithms Computation Computational fluid dynamics Computational modeling Computer Society Dynamics Fluid dynamics Gain Parallel processing Performance analysis Runtime Sequential analysis Servers Sparse matrices Studies Tradeoffs Workload
ISSN:	1045-9219, 1558-2183
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	This paper presents a time stamp algorithm for runtime parallelization of general DOACROSS loops that have indirect access patterns. The algorithm follows the INSPECTOR/EXECUTOR scheme and exploits parallelism at a fine-grained memory reference level. It features a parallel inspector and improves upon previous algorithms of the same generality by exploiting parallelism among consecutive reads of the same memory element. Two variants of the algorithm are considered: One allows partially concurrent reads (PCR) and the other allows fully concurrent reads (FCR). Analyses of their time complexities derive a necessary condition with respect to the iteration workload for runtime parallelization. Experimental results for a Gaussian elimination loop, as well as an extensive set of synthetic loops on a 12-way SMP server, show that the time stamp algorithms outperform iteration-level parallelization techniques in most test cases and gain speedups over sequential execution for loops that have heavy iteration workloads. The PCR algorithm performs best because it makes a better trade-off between maximizing the parallelism and minimizing the analysis overhead. For loops with light or unknown iteration loads, an alternative speculative runtime parallelization technique is preferred.
Bibliografia:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 content type line 23
ISSN:	1045-9219 1558-2183
DOI:	10.1109/71.926166