Optimization and Parallelization of Fuzzy Clustering Algorithm Based on the Improved Kmeans++ Clustering

In the field of big data, fuzzy clustering algorithm is one of the most widely used clustering algorithm. Although fuzzy c-means (FCM) algorithm performs well, it still is difficult to ascertain the number of clusters and is sensitive to initial clustering center. To solve these problems, an improve...

Full description

Saved in:
Bibliographic Details
Published in:IOP conference series. Materials Science and Engineering Vol. 768; no. 7; pp. 72106 - 72112
Main Authors: Ma, Yu, Cheng, Wenjuan
Format: Journal Article
Language:English
Published: Bristol IOP Publishing 01.03.2020
Subjects:
ISSN:1757-8981, 1757-899X
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the field of big data, fuzzy clustering algorithm is one of the most widely used clustering algorithm. Although fuzzy c-means (FCM) algorithm performs well, it still is difficult to ascertain the number of clusters and is sensitive to initial clustering center. To solve these problems, an improved fuzzy clustering algorithm based on Spark is proposed in this article. The proposed algorithm integrates the L2 norm and uses the kmeans++ algorithm improved by the Canopy algorithm to initialize the cluster center. Experimental results show that the proposed algorithm performs well in clustering accuracy and computational performance.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1757-8981
1757-899X
DOI:10.1088/1757-899X/768/7/072106