Improved CURE clustering for big data using Hadoop and Mapreduce

In the Era of Information, Extracting useful information out of massive amount of data and process them in less span of time has become crucial part of Data mining. CURE is very useful hierarchical algorithm which has ability to identify cluster of arbitrary shape and able to identify outliers. In t...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	2016 International Conference on Inventive Computation Technologies (ICICT) Ročník 3; s. 1 - 5
Hlavní autori:	Lathiya, Piyush, Rani, Rinkle
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 01.08.2016
Predmet:	Algorithm design and analysis Classification algorithms Clustering Clustering algorithms CURE Data mining Distributed databases Hadoop Mapreduce Outliers Partitioning algorithms Shape
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	In the Era of Information, Extracting useful information out of massive amount of data and process them in less span of time has become crucial part of Data mining. CURE is very useful hierarchical algorithm which has ability to identify cluster of arbitrary shape and able to identify outliers. In this paper we have implemented CURE clustering algorithm over distributed environment using Apache Hadoop. Now a days, to handle large store and handle huge data, Mapreduce has become very popular paradigm. Mapper and Reducer routines are designed for CURE algorithm. We have also discussed how different parameters affect quality of clusters by removing outliers.
DOI:	10.1109/INVENTIVE.2016.7830238