Big data analytics on Apache Spark

Apache Spark has emerged as the de facto framework for big data analytics with its advanced in-memory programming model and upper-level libraries for scalable machine learning, graph analysis, streaming and structured data processing. It is a general-purpose cluster computing framework with language...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:International journal of data science and analytics Ročník 1; číslo 3-4; s. 145 - 164
Hlavní autoři: Salloum, Salman, Dautov, Ruslan, Chen, Xiaojun, Peng, Patrick Xiaogang, Huang, Joshua Zhexue
Médium: Journal Article
Jazyk:angličtina
Vydáno: Cham Springer International Publishing 01.11.2016
Témata:
ISSN:2364-415X, 2364-4168
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Apache Spark has emerged as the de facto framework for big data analytics with its advanced in-memory programming model and upper-level libraries for scalable machine learning, graph analysis, streaming and structured data processing. It is a general-purpose cluster computing framework with language-integrated APIs in Scala, Java, Python and R. As a rapidly evolving open source project, with an increasing number of contributors from both academia and industry, it is difficult for researchers to comprehend the full body of development and research behind Apache Spark, especially those who are beginners in this area. In this paper, we present a technical review on big data analytics using Apache Spark. This review focuses on the key components, abstractions and features of Apache Spark. More specifically, it shows what Apache Spark has for designing and implementing big data algorithms and pipelines for machine learning, graph analysis and stream processing. In addition, we highlight some research and development directions on Apache Spark for big data analytics.
ISSN:2364-415X
2364-4168
DOI:10.1007/s41060-016-0027-9