Research on Distributed Parallelization of Improved Spectral Clustering Algorithm for Big Data

In the field of data mining, clustering algorithms play a key role in extracting valuable insights from vast datasets without incorporating learning mechanisms. One such classical clustering approach is the spectral clustering algorithm. This algorithm effectively converts a clustering challenge int...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	2024 IEEE 3rd International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA) S. 544 - 549
1. Verfasser:	Yang, Han
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 27.02.2024
Schlagworte:	Big Data Clustering algorithms Clustering methods data partitioning density-sensitive similarity distributed parallelizatlon Electrical engineering Euclidean distance Learning systems Partitioning algorithms Spectral clustering
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In the field of data mining, clustering algorithms play a key role in extracting valuable insights from vast datasets without incorporating learning mechanisms. One such classical clustering approach is the spectral clustering algorithm. This algorithm effectively converts a clustering challenge into the segmentation of an undirected graph, enabling it to handle intricate non-convex datasets adeptly and avoid getting trapped in local optimization pitfalls. Nevertheless, the conventional spectral clustering technique relies on the Gaussian kernel function, which uses Euclidean distance to determine sample similarities. This method proves overly sensitive to the Gaussian kernel's parameters and fails to accurately represent inter-sample relationships. To address the drawbacks related to similarity measurement and the computational inefficiencies inherent in the traditional spectral clustering method, enhancements have been made to refine the clustering outcomes. The enhanced spectral clustering algorithm has been redesigned to be distributed and parallelized, a strategic move intended to bolster the processing ability when handling enormous datasets.
DOI:	10.1109/EEBDA60612.2024.10485912