A gradient ascent algorithm based on possibilistic fuzzy C-Means for clustering noisy data

Real-world data are often corrupted by noise and outliers, which are originated from different procedures such as data collection, storage, and processing. Noise and outliers decrease the quality of clustering and lead to the inaccurate and misplaced cluster centers. In this paper, we propose a new...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications Vol. 191; p. 116153
Main Authors: Saberi, Hossein, Sharbati, Reza, Farzanegan, Behzad
Format: Journal Article
Language:English
Published: New York Elsevier Ltd 01.04.2022
Elsevier BV
Subjects:
ISSN:0957-4174, 1873-6793
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Real-world data are often corrupted by noise and outliers, which are originated from different procedures such as data collection, storage, and processing. Noise and outliers decrease the quality of clustering and lead to the inaccurate and misplaced cluster centers. In this paper, we propose a new algorithm called Improved Possibilistic Fuzzy C-Means (IPFCM) to cluster noisy data. First, initial cluster centers are calculated by Possibilistic Fuzzy C-Means (PFCM) which do not match dense regions of the data. Then, the domain is divided to some subdomains and each data point is assigned to a sub-domain. The cluster centers are iteratively moved towards high-density regions by maximizing a novel cluster validity index. In the proposed method, a Gaussian membership function is defined on each cluster to weight the data. Then, the sum of weights in each cluster is calculated. The product of these values is considered as the validity index. Since division of the domain is changed with moving the cluster centers, this procedure is repeated until the convergent criterion is satisfied. Cluster analysis performed on six synthetics, nine real benchmarks datasets shows the superiority of IPFCM over some previous clustering algorithms such as Fuzzy C-Means (FCM), PFCM, Kernel Fuzzy C-Means (KFCM), Noise Clustering (NC), and Generalized Entropy based Possibilistic Fuzzy C-Means (GEPFCM). The clustering results of near-fault ground motion data indicate that the cluster centers identified by IPFCM are well separated from each other, while those for PFCM are close to each other in some datasets. Moreover, the results show that the impact of noisy data on the proposed index and consequently cluster analysis decreases as the noisy data get away from the cluster centers which is one of the advantages of using IPFCM algorithm.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2021.116153