Creating optimal code for GPU-accelerated CT reconstruction using ant colony optimization

Purpose: CT reconstruction algorithms implemented on the GPU are highly sensitive to their implementation details and the hardware they run on. Fine-tuning an implementation for optimal performance can be a time consuming task and require many updates when the hardware changes. There are some techni...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Medical physics (Lancaster) Ročník 40; číslo 3; s. 031110 - n/a
Hlavní autoři: Papenhausen, Eric, Zheng, Ziyi, Mueller, Klaus
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States American Association of Physicists in Medicine 01.03.2013
Témata:
ISSN:0094-2405, 2473-4209, 2473-4209
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Purpose: CT reconstruction algorithms implemented on the GPU are highly sensitive to their implementation details and the hardware they run on. Fine-tuning an implementation for optimal performance can be a time consuming task and require many updates when the hardware changes. There are some techniques that do automatic fine-tuning of GPU code. These techniques, however, are relatively narrow in their fine-tuning and are often based on heuristics which can be inaccurate. The goal of this paper is to present a framework that will automate the process of code optimization with maximum flexibility and produce a final result that is efficient and readable to the user. Methods: The authors propose a method that is able to tune high level implementation details by using the ant colony optimization algorithm to find the optimal implementation in a relatively short amount of time. Our framework does this by taking as input, a file that describes a graph, such that a path through this graph represents a potential implementation. They then use the ant colony optimization algorithm to find the optimal path through this graph based on the execution time and the quality of the image. Results: Two experimental studies are carried out. Using the presented framework, they optimize the performance of a GPU accelerated FDK backprojection implementation and a GPU accelerated separable footprint backprojection implementation. The authors demonstrate that the resulting optimal implementation can be different depending on the hardware specifications. They then compare the results of the framework produced with the results produced by manual optimization. Conclusions: The framework they present is a useful tool for increasing programmer productivity and reducing the overhead of leveraging hardware specific resources. By performing an intelligent search, our framework produces a more efficient image reconstruction implementation in a shorter amount of time.
Bibliografie:epapenhausen@cs.sunysb.edu
zizhen@cs.sunysb.edu
Electronic mail
mueller@cs.sunysb.edu
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0094-2405
2473-4209
2473-4209
DOI:10.1118/1.4773045