An Efficient Spectral Clustering Algorithm Based on Granular-Ball

In order to solve the problem that the traditional spectral clustering algorithm is time-consuming and resource consuming when applied to large-scale data, resulting in poor clustering effect or even unable to cluster, this paper proposes a spectral clustering algorithm based on granular-ball(GBSC)....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering Jg. 35; H. 9; S. 9743 - 9753
Hauptverfasser:	Xie, Jiang, Kong, Weiyu, Xia, Shuyin, Wang, Guoyin, Gao, Xinbo
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York IEEE 01.09.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Algorithms Approximation algorithms Clustering Clustering algorithms Clustering methods Datasets granular computing granular-ball Machine learning algorithms Matrix decomposition Partitioning algorithms Similarity spectral clustering Time complexity
ISSN:	1041-4347, 1558-2191
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In order to solve the problem that the traditional spectral clustering algorithm is time-consuming and resource consuming when applied to large-scale data, resulting in poor clustering effect or even unable to cluster, this paper proposes a spectral clustering algorithm based on granular-ball(GBSC). The algorithm changes the construction method of the similarity matrix. Based on granular-ball, the size of the similarity matrix is greatly reduced, and the construction of the similarity matrix is more reasonable. Experimental results show that the proposed algorithm achieves better speedup ratio, less memory consumption and stronger anti noise performance while achieving similar clustering results to the traditional spectral clustering algorithm. Suppose the number of granular-balls is <inline-formula><tex-math notation="LaTeX">m</tex-math> <mml:math><mml:mi>m</mml:mi></mml:math><inline-graphic xlink:href="xie-ieq1-3249475.gif"/> </inline-formula>, <inline-formula><tex-math notation="LaTeX">n</tex-math> <mml:math><mml:mi>n</mml:mi></mml:math><inline-graphic xlink:href="xie-ieq2-3249475.gif"/> </inline-formula> is the number of points in the dataset, and <inline-formula><tex-math notation="LaTeX">m< < n</tex-math> <mml:math><mml:mrow><mml:mi>m</mml:mi><mml:mo><</mml:mo><mml:mo><</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="xie-ieq3-3249475.gif"/> </inline-formula>, the time complexity of GBSC is <inline-formula><tex-math notation="LaTeX">O(m^{3})</tex-math> <mml:math><mml:mrow><mml:mi>O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>m</mml:mi><mml:mn>3</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="xie-ieq4-3249475.gif"/> </inline-formula>. It is proved that GBSC has good adaptability to large-scale datasets. All codes have been released at https://github.com/xjnine/GBSC .
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2023.3249475