Two improved k-means algorithms

[Display omitted] K-means algorithm is the most commonly used simple clustering method. For a large number of high dimensional numerical data, it provides an efficient method for classifying similar data into the same cluster. In this study, a tri-level k-means algorithm and a bi-layer k-means algor...

Full description

Saved in:

Bibliographic Details
Published in:	Applied soft computing Vol. 68; pp. 747 - 755
Main Authors:	Yu, Shyr-Shen, Chu, Shao-Wei, Wang, Chuin-Mu, Chan, Yung-Kuan, Chang, Ting-Cheng
Format:	Journal Article
Language:	English
Published:	Elsevier B.V 01.07.2018
Subjects:	Genetic algorithm k-Means algorithm Noise data Online machine learning Outlier Outlier k-Means algorithm Online machine learning Genetic algorithm Noise data
ISSN:	1568-4946, 1872-9681
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	[Display omitted] K-means algorithm is the most commonly used simple clustering method. For a large number of high dimensional numerical data, it provides an efficient method for classifying similar data into the same cluster. In this study, a tri-level k-means algorithm and a bi-layer k-means algorithm are proposed. The k-means algorithm is vulnerable to outliers and noisy data, and also susceptible to initial cluster centers. The tri-level k-means algorithm can overcome these drawbacks. While the data in a dataset S are often changed, after a period of time the trained cluster centers cannot precisely describe the data in each cluster. The cluster centers hence need to be updated. In this paper, an online machine learning based tri-level k-means algorithm is also provided to solve this problem. When the data in a cluster are significantly different, a cluster center cannot alone precisely describe each datum in the cluster. Noisy data, outliers, and data with quite different values in the same cluster may decrease the performance of pattern matching systems. The bi-layer k-means algorithm can deal with the above problems. Meanwhile, a genetic-based algorithm is provided to derive the fittest parameters used in the tri-level and bi-layer k-means algorithms. Experimental results demonstrate that both algorithms can provide much better accuracy of classification than the traditional k-means algorithm.
ISSN:	1568-4946 1872-9681
DOI:	10.1016/j.asoc.2017.08.032