Acceleration of computer based simulation, image processing, and data analysis using computer clusters with heterogeneous accelerators

Saved in:
Bibliographic Details
Title: Acceleration of computer based simulation, image processing, and data analysis using computer clusters with heterogeneous accelerators
Authors: Chen, Chong
Source: Graduate Theses and Dissertations
Publisher Information: eCommons
Publication Year: 2016
Collection: University of Dayton: eCommons
Subject Terms: Heterogeneous distributed computing systems, Parallel computers, Multiprocessors, Computer Engineering, parallel computing, distributed computing, GPGPU, Xeon Phi, Preconditioned Iterative Solver, ALS, bilateral filtering
Description: With the limits to frequency scaling in microprocessors due to power constraints, many-core and multi-core architectures have become the norm over the past decade. The goal of this work is the acceleration of key computer simulation tools, data processing, and data analysis algorithms in multi-core and many-core computer clusters and the analysis of their accelerated performances. The main contributions of this dissertation are: 1. Acceleration of vector bilateral filtering for hyperspectral imaging with GPGPU: a GPGPU based acceleration for vector bilateral filtering called vBF_GPU was implemented in this dissertation. vBF_GPU use multiple threads to processing one pixel of a hyperspectral image to improve the efficiency of the cache memory. The memory access operation of vBF_GPU was fully optimized to reduce the data transfer cost of the GPGPU program. The experiment results indicate that vBF_GPU can provide up to 19x speedup when compared with a multi-core CPU implementation and up to 3x speedup when compared with a naive GPGPU implementation of vector bilateral filtering. vBF_GPU can process hyperspectral imaging with up to 266 spectrums, and the window size of the bilateral filter is unlimited.;"2. Optimization of acceleration of alternative least square algorithm using GPGPU cluster: this study presented an optimized implementation for Alternative Least Square Algorithm (ALS) to realize large-scale matrix factorization based recommendation system. In this study, a GPGPU optimized implementation is developed to conduct the batch solver in ALS algorithm. An equivalent mathematical form of equations was used to simplify the computation complexity of ALS algorithm. A distributed version of this implementation was also developed and tested using a cluster of GPGPUs. The experiment results in this study indicates that our application running at a GPGPU can achieve up to 3.8x speedup when compared with an 8-core CPU. And the distributed implementation made excellent scalability at a computer cluster with multiple ...
Document Type: text
Language: unknown
Relation: https://ecommons.udayton.edu/graduate_theses/1207; http://rave.ohiolink.edu/etdc/view?acc_num=dayton148036732102682
Availability: https://ecommons.udayton.edu/graduate_theses/1207
http://rave.ohiolink.edu/etdc/view?acc_num=dayton148036732102682
Rights: Copyright © 2016, author
Accession Number: edsbas.F44DCDCD
Database: BASE
Description
Abstract:With the limits to frequency scaling in microprocessors due to power constraints, many-core and multi-core architectures have become the norm over the past decade. The goal of this work is the acceleration of key computer simulation tools, data processing, and data analysis algorithms in multi-core and many-core computer clusters and the analysis of their accelerated performances. The main contributions of this dissertation are: 1. Acceleration of vector bilateral filtering for hyperspectral imaging with GPGPU: a GPGPU based acceleration for vector bilateral filtering called vBF_GPU was implemented in this dissertation. vBF_GPU use multiple threads to processing one pixel of a hyperspectral image to improve the efficiency of the cache memory. The memory access operation of vBF_GPU was fully optimized to reduce the data transfer cost of the GPGPU program. The experiment results indicate that vBF_GPU can provide up to 19x speedup when compared with a multi-core CPU implementation and up to 3x speedup when compared with a naive GPGPU implementation of vector bilateral filtering. vBF_GPU can process hyperspectral imaging with up to 266 spectrums, and the window size of the bilateral filter is unlimited.;"2. Optimization of acceleration of alternative least square algorithm using GPGPU cluster: this study presented an optimized implementation for Alternative Least Square Algorithm (ALS) to realize large-scale matrix factorization based recommendation system. In this study, a GPGPU optimized implementation is developed to conduct the batch solver in ALS algorithm. An equivalent mathematical form of equations was used to simplify the computation complexity of ALS algorithm. A distributed version of this implementation was also developed and tested using a cluster of GPGPUs. The experiment results in this study indicates that our application running at a GPGPU can achieve up to 3.8x speedup when compared with an 8-core CPU. And the distributed implementation made excellent scalability at a computer cluster with multiple ...