Optimizing modulo scheduling to achieve reuse and concurrency for stream processors

Both reuse and concurrency are performance-critical for stream processors. When applying loop unrolling and software pipelining separately to stream-level loops, either reuse or concurrency or both may be inadequately exploited. In this paper, we optimize modulo scheduling to maximize stream reuse a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing Jg. 59; H. 3; S. 1229 - 1251
Hauptverfasser:	Wang, Li, Xue, Jingling, Yang, Xuejun
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Boston Springer US 01.03.2012
Schlagworte:	Compilers Computer Science Interpreters Processor Architectures Programming Languages Stream register file Stream programming model Stream processor Software pipelining Loop unrolling
ISSN:	0920-8542, 1573-0484
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Both reuse and concurrency are performance-critical for stream processors. When applying loop unrolling and software pipelining separately to stream-level loops, either reuse or concurrency or both may be inadequately exploited. In this paper, we optimize modulo scheduling to maximize stream reuse and improve concurrency for stream-level loops. The key insight is that an unrolled and software-pipelined stream-level loop could be described by a set of reuse equations. Guided by reuse equations, a reuse-aware modulo scheduling algorithm is developed to simultaneously optimize the two performance objectives, reuse, and concurrency, for a loop in a unified framework. Moreover, we describe a code generation algorithm to automatically produce the optimized loop from a given loop. The experimental results obtained on FT64 and by simulation demonstrate the effectiveness of the proposed approach.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-010-0522-z