A Domain Specific Approach to High Performance Heterogeneous Computing

Users of heterogeneous computing systems face two problems: first, in understanding the trade-off relationships between the observable characteristics of their applications, such as latency and quality of the result, and second, how to exploit knowledge of these characteristics to allocate work to d...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on parallel and distributed systems Ročník 28; číslo 1; s. 2 - 15
Hlavní autoři: Inggs, Gordon, Thomas, David B., Luk, Wayne
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.01.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1045-9219, 1558-2183
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Users of heterogeneous computing systems face two problems: first, in understanding the trade-off relationships between the observable characteristics of their applications, such as latency and quality of the result, and second, how to exploit knowledge of these characteristics to allocate work to distributed computing platforms efficiently. A domain specific approach addresses both of these problems. By considering a subset of operations or functions, models of the observable characteristics or domain metrics may be formulated in advance, and populated at run-time for task instances. These metric models can then be used to express the allocation of work as a constrained integer program. These claims are illustrated using the domain of derivatives pricing in computational finance, with the domain metrics of workload latency and pricing accuracy. For a large, varied workload of 128 Black-Scholes and Heston model-based option pricing tasks, running upon a diverse array of 16 Multicore CPUs, GPUs and FPGAs platforms, predictions made by models of both the makespan and accuracy are generally within 10 percent of the run-time performance. When these models are used as inputs to machine learning and MILP-based workload allocation approaches, a latency improvement of up to 24 and 270 times over the heuristic approach is seen.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2016.2563427