Big data analytics on Apache Spark

Apache Spark has emerged as the de facto framework for big data analytics with its advanced in-memory programming model and upper-level libraries for scalable machine learning, graph analysis, streaming and structured data processing. It is a general-purpose cluster computing framework with language...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:International journal of data science and analytics Ročník 1; číslo 3-4; s. 145 - 164
Hlavní autori: Salloum, Salman, Dautov, Ruslan, Chen, Xiaojun, Peng, Patrick Xiaogang, Huang, Joshua Zhexue
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Cham Springer International Publishing 01.11.2016
Predmet:
ISSN:2364-415X, 2364-4168
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Apache Spark has emerged as the de facto framework for big data analytics with its advanced in-memory programming model and upper-level libraries for scalable machine learning, graph analysis, streaming and structured data processing. It is a general-purpose cluster computing framework with language-integrated APIs in Scala, Java, Python and R. As a rapidly evolving open source project, with an increasing number of contributors from both academia and industry, it is difficult for researchers to comprehend the full body of development and research behind Apache Spark, especially those who are beginners in this area. In this paper, we present a technical review on big data analytics using Apache Spark. This review focuses on the key components, abstractions and features of Apache Spark. More specifically, it shows what Apache Spark has for designing and implementing big data algorithms and pipelines for machine learning, graph analysis and stream processing. In addition, we highlight some research and development directions on Apache Spark for big data analytics.
ISSN:2364-415X
2364-4168
DOI:10.1007/s41060-016-0027-9