Accelerating Fuzzy-C Means Using an Estimated Subsample Size

Many algorithms designed to accelerate the fuzzy c-means (FCM) clustering algorithm randomly sample the data. Typically, no statistical method is used to estimate the subsample size, despite the impact subsample sizes have on speed and quality. This paper introduces two new accelerated algorithms, i...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on fuzzy systems Vol. 22; no. 5; pp. 1229 - 1244
Main Authors: Parker, Jonathon K., Hall, Lawrence O.
Format: Journal Article
Language:English
Published: United States IEEE 01.10.2014
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1063-6706, 1941-0034
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Many algorithms designed to accelerate the fuzzy c-means (FCM) clustering algorithm randomly sample the data. Typically, no statistical method is used to estimate the subsample size, despite the impact subsample sizes have on speed and quality. This paper introduces two new accelerated algorithms, i.e., geometric progressive fuzzy c-means (GOFCM) and minimum sample estimate random fuzzy c-means (MSERFCM), that use a statistical method to estimate the subsample size. GOFCM, which is a variant of single-pass fuzzy c-means (SPFCM), also leverages progressive sampling. MSERFCM, which is a variant of random sampling plus extension fuzzy c-means, gains a speedup from improved initialization. A general, novel stopping criterion for accelerated clustering is introduced. The new algorithms are compared with FCM and four accelerated variants of FCM. GOFCM's speedup was four-47 times that of FCM and faster than SPFCM on each of the six datasets that are used in the experiments. For five of the datasets, partitions were within 1% of those of FCM. MSERFCM's speedup was five-26 times that of FCM and produced partitions within 3% of those of FCM on all datasets. A unique dataset, consisting of plankton images, exposed the strengths and weaknesses of many of the algorithms tested. It is shown that the new stopping criterion is effective in speeding up algorithms such as SPFCM and the final partitions are very close to those of FCM.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
jkparker@mail.usf.edu, hall@csee.usf.edu.
ISSN:1063-6706
1941-0034
DOI:10.1109/TFUZZ.2013.2286993