Acceleration of computer based simulation, image processing, and data analysis using computer clusters with heterogeneous accelerators
Saved in:
| Title: | Acceleration of computer based simulation, image processing, and data analysis using computer clusters with heterogeneous accelerators |
|---|---|
| Authors: | Chen, Chong |
| Source: | Graduate Theses and Dissertations |
| Publisher Information: | eCommons |
| Publication Year: | 2016 |
| Collection: | University of Dayton: eCommons |
| Subject Terms: | Heterogeneous distributed computing systems, Parallel computers, Multiprocessors, Computer Engineering, parallel computing, distributed computing, GPGPU, Xeon Phi, Preconditioned Iterative Solver, ALS, bilateral filtering |
| Description: | With the limits to frequency scaling in microprocessors due to power constraints, many-core and multi-core architectures have become the norm over the past decade. The goal of this work is the acceleration of key computer simulation tools, data processing, and data analysis algorithms in multi-core and many-core computer clusters and the analysis of their accelerated performances. The main contributions of this dissertation are: 1. Acceleration of vector bilateral filtering for hyperspectral imaging with GPGPU: a GPGPU based acceleration for vector bilateral filtering called vBF_GPU was implemented in this dissertation. vBF_GPU use multiple threads to processing one pixel of a hyperspectral image to improve the efficiency of the cache memory. The memory access operation of vBF_GPU was fully optimized to reduce the data transfer cost of the GPGPU program. The experiment results indicate that vBF_GPU can provide up to 19x speedup when compared with a multi-core CPU implementation and up to 3x speedup when compared with a naive GPGPU implementation of vector bilateral filtering. vBF_GPU can process hyperspectral imaging with up to 266 spectrums, and the window size of the bilateral filter is unlimited.;"2. Optimization of acceleration of alternative least square algorithm using GPGPU cluster: this study presented an optimized implementation for Alternative Least Square Algorithm (ALS) to realize large-scale matrix factorization based recommendation system. In this study, a GPGPU optimized implementation is developed to conduct the batch solver in ALS algorithm. An equivalent mathematical form of equations was used to simplify the computation complexity of ALS algorithm. A distributed version of this implementation was also developed and tested using a cluster of GPGPUs. The experiment results in this study indicates that our application running at a GPGPU can achieve up to 3.8x speedup when compared with an 8-core CPU. And the distributed implementation made excellent scalability at a computer cluster with multiple ... |
| Document Type: | text |
| Language: | unknown |
| Relation: | https://ecommons.udayton.edu/graduate_theses/1207; http://rave.ohiolink.edu/etdc/view?acc_num=dayton148036732102682 |
| Availability: | https://ecommons.udayton.edu/graduate_theses/1207 http://rave.ohiolink.edu/etdc/view?acc_num=dayton148036732102682 |
| Rights: | Copyright © 2016, author |
| Accession Number: | edsbas.F44DCDCD |
| Database: | BASE |
| Abstract: | With the limits to frequency scaling in microprocessors due to power constraints, many-core and multi-core architectures have become the norm over the past decade. The goal of this work is the acceleration of key computer simulation tools, data processing, and data analysis algorithms in multi-core and many-core computer clusters and the analysis of their accelerated performances. The main contributions of this dissertation are: 1. Acceleration of vector bilateral filtering for hyperspectral imaging with GPGPU: a GPGPU based acceleration for vector bilateral filtering called vBF_GPU was implemented in this dissertation. vBF_GPU use multiple threads to processing one pixel of a hyperspectral image to improve the efficiency of the cache memory. The memory access operation of vBF_GPU was fully optimized to reduce the data transfer cost of the GPGPU program. The experiment results indicate that vBF_GPU can provide up to 19x speedup when compared with a multi-core CPU implementation and up to 3x speedup when compared with a naive GPGPU implementation of vector bilateral filtering. vBF_GPU can process hyperspectral imaging with up to 266 spectrums, and the window size of the bilateral filter is unlimited.;"2. Optimization of acceleration of alternative least square algorithm using GPGPU cluster: this study presented an optimized implementation for Alternative Least Square Algorithm (ALS) to realize large-scale matrix factorization based recommendation system. In this study, a GPGPU optimized implementation is developed to conduct the batch solver in ALS algorithm. An equivalent mathematical form of equations was used to simplify the computation complexity of ALS algorithm. A distributed version of this implementation was also developed and tested using a cluster of GPGPUs. The experiment results in this study indicates that our application running at a GPGPU can achieve up to 3.8x speedup when compared with an 8-core CPU. And the distributed implementation made excellent scalability at a computer cluster with multiple ... |
|---|
Nájsť tento článok vo Web of Science