SparkSNN: A density-based clustering algorithm on spark

Clustering is one of the most commonly used data mining techniques. Shared nearest neighbor clustering is an important density-based clustering technique that has been widely adopted in many application domains, such as environmental science and urban computing. As the size of data becomes extremely...

Full description

Saved in:

Bibliographic Details
Published in:	2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA) pp. 433 - 437
Main Authors:	Aryal, Amar Mani, Wang, Sujing
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.03.2018
Subjects:	Big Data Clustering algorithms data mining density-based clustering algorithm Indexes Merging Partitioning algorithms shared nearest neighbor clustering Silicon Spark Sparks
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Clustering is one of the most commonly used data mining techniques. Shared nearest neighbor clustering is an important density-based clustering technique that has been widely adopted in many application domains, such as environmental science and urban computing. As the size of data becomes extremely large nowadays, it is impossible for large-scale data to be processed on a single machine. Therefore, the scalability problem of traditional clustering algorithms running on a single machine must be addressed. In this paper, we improve the traditional density-based clustering algorithm by utilizing powerful programming platform (Spark) and distributed computing clusters. In particular, we design and implement Spark-based shared nearest neighbor clustering algorithm called SparkSNN, a scalable density-based clustering algorithm on Spark for big data analysis. We conduct our experiments using real data, i.e., Maryland crime data, to evaluate the performance of the proposed algorithm with respect to speed-up and scale-up. The experimental results well confirm the effectiveness and efficiency of the proposed SparkSNN clustering algorithm.
DOI:	10.1109/ICBDA.2018.8367722