Exploiting loop-dependent Stream Reuse for stream processors

The memory access limits the performance of stream processors. By exploiting the reuse of data held in the Stream Register File (SRF), an on-chip storage, the number of memory accesses can be reduced. In current stream compilers reuse is only attempted for simple stream references, those whose start...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	PACT'08 : proceedings of the Seventeenth International Conference on Parallel Architectures and Compilation Techniques : Toronto, Ontario, Canada, October 25-29, 2008 s. 22 - 31
Hlavní autoři:	Yang, Xuejun, Zhang, Ying, Xue, Jingling, Rogers, Ian, Li, Gen, Wang, Guibin
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	ACM 01.10.2008
Témata:	Computer architecture Generators Kernel Program processors Programming Stream Professor Stream Programming Model Stream Register File Stream Reuse StreamC Streaming media System-on-chip
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	The memory access limits the performance of stream processors. By exploiting the reuse of data held in the Stream Register File (SRF), an on-chip storage, the number of memory accesses can be reduced. In current stream compilers reuse is only attempted for simple stream references, those whose start and end are known. Compiler analysis from outside of stream processors does not directly enable the consideration of other complex stream references. In this paper we propose a transformation to automatically optimize stream programs to exploit the reuse supplied by loop-dependent stream references. The transformation is based on three results: algorithms to recognize the reuse supplied by stream references, a new abstract expression called the Stream Reuse Graph (SRG) to depict the reuse and the optimization of the SRG for the transformation. Both the reuse between whole sequences accessed by stream references and that between partial sequences are exploited in the paper. In particular, the problem of exploiting partial stream reuse does not have its parallel in the traditional data reuse exploitation setting (for scalars and arrays). Finally, we have implemented our techniques using the StreamC/KernelC compiler for Imagine. Experimental results show a resultant speedup of 1.14 to 2.54 times using a range of typical stream processing application kernels.
DOI:	10.1145/1454115.1454121