Two improved k-means algorithms

[Display omitted] K-means algorithm is the most commonly used simple clustering method. For a large number of high dimensional numerical data, it provides an efficient method for classifying similar data into the same cluster. In this study, a tri-level k-means algorithm and a bi-layer k-means algor...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Applied soft computing Ročník 68; s. 747 - 755
Hlavní autoři:	Yu, Shyr-Shen, Chu, Shao-Wei, Wang, Chuin-Mu, Chan, Yung-Kuan, Chang, Ting-Cheng
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 01.07.2018
Témata:	Genetic algorithm k-Means algorithm Noise data Online machine learning Outlier Outlier k-Means algorithm Online machine learning Genetic algorithm Noise data
ISSN:	1568-4946, 1872-9681
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	[Display omitted] K-means algorithm is the most commonly used simple clustering method. For a large number of high dimensional numerical data, it provides an efficient method for classifying similar data into the same cluster. In this study, a tri-level k-means algorithm and a bi-layer k-means algorithm are proposed. The k-means algorithm is vulnerable to outliers and noisy data, and also susceptible to initial cluster centers. The tri-level k-means algorithm can overcome these drawbacks. While the data in a dataset S are often changed, after a period of time the trained cluster centers cannot precisely describe the data in each cluster. The cluster centers hence need to be updated. In this paper, an online machine learning based tri-level k-means algorithm is also provided to solve this problem. When the data in a cluster are significantly different, a cluster center cannot alone precisely describe each datum in the cluster. Noisy data, outliers, and data with quite different values in the same cluster may decrease the performance of pattern matching systems. The bi-layer k-means algorithm can deal with the above problems. Meanwhile, a genetic-based algorithm is provided to derive the fittest parameters used in the tri-level and bi-layer k-means algorithms. Experimental results demonstrate that both algorithms can provide much better accuracy of classification than the traditional k-means algorithm.
ISSN:	1568-4946 1872-9681
DOI:	10.1016/j.asoc.2017.08.032