Research on Distributed Parallelization of Improved Spectral Clustering Algorithm for Big Data
In the field of data mining, clustering algorithms play a key role in extracting valuable insights from vast datasets without incorporating learning mechanisms. One such classical clustering approach is the spectral clustering algorithm. This algorithm effectively converts a clustering challenge int...
Gespeichert in:
| Veröffentlicht in: | 2024 IEEE 3rd International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA) S. 544 - 549 |
|---|---|
| 1. Verfasser: | |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
27.02.2024
|
| Schlagworte: | |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | In the field of data mining, clustering algorithms play a key role in extracting valuable insights from vast datasets without incorporating learning mechanisms. One such classical clustering approach is the spectral clustering algorithm. This algorithm effectively converts a clustering challenge into the segmentation of an undirected graph, enabling it to handle intricate non-convex datasets adeptly and avoid getting trapped in local optimization pitfalls. Nevertheless, the conventional spectral clustering technique relies on the Gaussian kernel function, which uses Euclidean distance to determine sample similarities. This method proves overly sensitive to the Gaussian kernel's parameters and fails to accurately represent inter-sample relationships. To address the drawbacks related to similarity measurement and the computational inefficiencies inherent in the traditional spectral clustering method, enhancements have been made to refine the clustering outcomes. The enhanced spectral clustering algorithm has been redesigned to be distributed and parallelized, a strategic move intended to bolster the processing ability when handling enormous datasets. |
|---|---|
| DOI: | 10.1109/EEBDA60612.2024.10485912 |