Enhancing Big Data Clustering: The Improved K-Means - Artificial Bee Colony Algorithm with MapReduce

Saved in:
Bibliographic Details
Title: Enhancing Big Data Clustering: The Improved K-Means - Artificial Bee Colony Algorithm with MapReduce
Authors: null Satish S. Banait
Source: Panamerican Mathematical Journal. 33:125-134
Publisher Information: Science Research Society, 2024.
Publication Year: 2024
Description: The volume and diversity of data produced by scientific applications and the corporate world have increased drastically in the modern era. Large data is challenging to collect, store, transform, and analyse. Processing vast amounts of data is a challenging task, and one major problem with big data is that it takes longer to run typical algorithms. One of the most common data mining jobs is clustering. It is applied in several fields. The K-Means clustering algorithm is widely recognized as one of the prominent unsupervised learning techniques in machine learning. The advantages involve fundamental clarity, good effect, and ease of execution. As the online world grew quickly, the rise in data collection locations occurred simultaneously, marking the era of big data and an explosion of information. This research introduces the IK-ABC Algorithm (Improved K Means - Artificial Bee Colony) to tackle various challenges encountered in k- means clustering algorithms. These issues encompass limitations in global search capabilities, the sensitivity of cluster center selection, randomness in initialization, early-stage development, and sluggish convergence observed in the original artificial bee colony algorithm. To expedite computation and enhance the effectiveness of the iterative optimization process, a custom fitness function tailored for the K-means clustering technique and a position update formula relying on global guidance was created through the utilization of MapReduce.
Document Type: Article
ISSN: 1064-9735
DOI: 10.52783/pmj.v33.i3.885
Accession Number: edsair.doi...........f64e42130c6c63333e360a88e64678d3
Database: OpenAIRE
Description
Abstract:The volume and diversity of data produced by scientific applications and the corporate world have increased drastically in the modern era. Large data is challenging to collect, store, transform, and analyse. Processing vast amounts of data is a challenging task, and one major problem with big data is that it takes longer to run typical algorithms. One of the most common data mining jobs is clustering. It is applied in several fields. The K-Means clustering algorithm is widely recognized as one of the prominent unsupervised learning techniques in machine learning. The advantages involve fundamental clarity, good effect, and ease of execution. As the online world grew quickly, the rise in data collection locations occurred simultaneously, marking the era of big data and an explosion of information. This research introduces the IK-ABC Algorithm (Improved K Means - Artificial Bee Colony) to tackle various challenges encountered in k- means clustering algorithms. These issues encompass limitations in global search capabilities, the sensitivity of cluster center selection, randomness in initialization, early-stage development, and sluggish convergence observed in the original artificial bee colony algorithm. To expedite computation and enhance the effectiveness of the iterative optimization process, a custom fitness function tailored for the K-means clustering technique and a position update formula relying on global guidance was created through the utilization of MapReduce.
ISSN:10649735
DOI:10.52783/pmj.v33.i3.885