Two improved k-means algorithms

[Display omitted] K-means algorithm is the most commonly used simple clustering method. For a large number of high dimensional numerical data, it provides an efficient method for classifying similar data into the same cluster. In this study, a tri-level k-means algorithm and a bi-layer k-means algor...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Applied soft computing Ročník 68; s. 747 - 755
Hlavní autori:	Yu, Shyr-Shen, Chu, Shao-Wei, Wang, Chuin-Mu, Chan, Yung-Kuan, Chang, Ting-Cheng
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Elsevier B.V 01.07.2018
Predmet:	Genetic algorithm k-Means algorithm Noise data Online machine learning Outlier Outlier k-Means algorithm Online machine learning Genetic algorithm Noise data
ISSN:	1568-4946, 1872-9681
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	[Display omitted] K-means algorithm is the most commonly used simple clustering method. For a large number of high dimensional numerical data, it provides an efficient method for classifying similar data into the same cluster. In this study, a tri-level k-means algorithm and a bi-layer k-means algorithm are proposed. The k-means algorithm is vulnerable to outliers and noisy data, and also susceptible to initial cluster centers. The tri-level k-means algorithm can overcome these drawbacks. While the data in a dataset S are often changed, after a period of time the trained cluster centers cannot precisely describe the data in each cluster. The cluster centers hence need to be updated. In this paper, an online machine learning based tri-level k-means algorithm is also provided to solve this problem. When the data in a cluster are significantly different, a cluster center cannot alone precisely describe each datum in the cluster. Noisy data, outliers, and data with quite different values in the same cluster may decrease the performance of pattern matching systems. The bi-layer k-means algorithm can deal with the above problems. Meanwhile, a genetic-based algorithm is provided to derive the fittest parameters used in the tri-level and bi-layer k-means algorithms. Experimental results demonstrate that both algorithms can provide much better accuracy of classification than the traditional k-means algorithm.
ISSN:	1568-4946 1872-9681
DOI:	10.1016/j.asoc.2017.08.032