Research on Distributed Parallelization of Improved Spectral Clustering Algorithm for Big Data

In the field of data mining, clustering algorithms play a key role in extracting valuable insights from vast datasets without incorporating learning mechanisms. One such classical clustering approach is the spectral clustering algorithm. This algorithm effectively converts a clustering challenge int...

Full description

Saved in:
Bibliographic Details
Published in:2024 IEEE 3rd International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA) pp. 544 - 549
Main Author: Yang, Han
Format: Conference Proceeding
Language:English
Published: IEEE 27.02.2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the field of data mining, clustering algorithms play a key role in extracting valuable insights from vast datasets without incorporating learning mechanisms. One such classical clustering approach is the spectral clustering algorithm. This algorithm effectively converts a clustering challenge into the segmentation of an undirected graph, enabling it to handle intricate non-convex datasets adeptly and avoid getting trapped in local optimization pitfalls. Nevertheless, the conventional spectral clustering technique relies on the Gaussian kernel function, which uses Euclidean distance to determine sample similarities. This method proves overly sensitive to the Gaussian kernel's parameters and fails to accurately represent inter-sample relationships. To address the drawbacks related to similarity measurement and the computational inefficiencies inherent in the traditional spectral clustering method, enhancements have been made to refine the clustering outcomes. The enhanced spectral clustering algorithm has been redesigned to be distributed and parallelized, a strategic move intended to bolster the processing ability when handling enormous datasets.
DOI:10.1109/EEBDA60612.2024.10485912