Design and implementation of reconfigurable acceleration for in-memory distributed big data computing

Apache Spark is an efficient distributed computing framework for big data processing. It supports in-memory computation of RDDs (Resilient Distributed Datasets) and provides a provision of reusability, fault tolerance, and real-time stream processing. However, the tasks in Spark framework are only p...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Future generation computer systems Ročník 92; s. 68 - 75
Hlavní autori: Hou, Junjie, Zhu, Yongxin, Du, Sen, Song, Shijin
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 01.03.2019
Predmet:
ISSN:0167-739X, 1872-7115
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Apache Spark is an efficient distributed computing framework for big data processing. It supports in-memory computation of RDDs (Resilient Distributed Datasets) and provides a provision of reusability, fault tolerance, and real-time stream processing. However, the tasks in Spark framework are only performed on CPU. The low degree of parallelism and power inefficiency of CPU may restrict the performance and scalability of the cluster. In order to improve the performance and power dissipation of the data center, heterogeneous accelerators such as FPGA, GPU, MIC (Many Integrated Core) exhibit more efficient performance than general-purpose processors in big data processing. In this work, we propose a framework to integrate FPGA accelerators into a Spark cluster, which achieves performance improvement and power dissipation reduction for distributed applications. We propose a method for connecting Spark with OpenCL application which is a standard for cross-platform, parallel programming of diverse processors and widely used in heterogeneous computing, and use FPGA to accelerate the Spark tasks developed with Python. We illustrate the performance and the energy efficiency of FPGA based Spark framework with a case study of K-means algorithm acceleration. The results show that FPGA based Spark implementation achieves 3.5x speedup and 4.06x energy efficiency over original Spark framework. •The methodology of integrating FPGA accelerator to the Spark framework is proposed.•Optimizing the execution of K-means clustering in FPGA based Spark implementation.•A case study of K-means algorithm acceleration on FPGA based Spark framework.
ISSN:0167-739X
1872-7115
DOI:10.1016/j.future.2018.09.049