Spectral embedded generalized mean based k-nearest neighbors clustering with S-distance

The spectral clustering algorithm is extensively employed in different aspects, especially in the field of pattern recognition. However, the efficient construction of the neighborhood graph is the main reason for its promising results. Generally, the similarity matrix relies on the applied similarit...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Expert systems with applications Ročník 169; s. 114326
Hlavní autori: Sharma, Krishna Kumar, Seal, Ayan
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier Ltd 01.05.2021
Predmet:
ISSN:0957-4174, 1873-6793
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The spectral clustering algorithm is extensively employed in different aspects, especially in the field of pattern recognition. However, the efficient construction of the neighborhood graph is the main reason for its promising results. Generally, the similarity matrix relies on the applied similarity measure between two data points, selection of k−nearest neighbors (KNN), and approach for the construction of a neighborhood graph. In this study, we integrate S-distance to spectral clustering, which is capable to find out the complex and non-linear cluster structures. Moreover, generalized mean distance-based KNN is proposed to decrease the sensitiveness towards the value of the k. Also, a symmetry-favored KNN method is applied to construct the neighborhood graph, which reduces the impact of outliers and noisy data points. However, spectral clustering faces scalability and speedup issues in the case of large size datasets. Thus, the proposed spectral clustering algorithm is also executed in distributed environments. Several experiments are performed to validate the proposed clustering algorithm on 20 real-world datasets and 3 large size datasets. Experimental results demonstrate that the proposed clustering algorithm outperforms some of the baseline methods in terms of accuracy and clustering error rates. Finally, we conduct Wilcoxon’s Rank-Sum test and illustrate that the proposed spectral clustering algorithm is statistically significant. •A churn prediction model is proposed using an enhanced spectral clustering (SC).•A non-linear distance measure called S-distance is merged with the conventional SC.•The proposed clustering algorithm is validated on 15 datasets.•Three state-of-the-art methods are considered to compare with the proposed one.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2020.114326