Research on Mini-Batch Affinity Propagation Clustering Algorithm

Clustering is a task of unsupervised learning, aiming to group a set of data so that data in the same group are more similar to each other than to those in other groups. Affinity propagation (AP) is a clustering algorithm which finds the exemplars (representative points) for data points by spreading...

Full description

Saved in:
Bibliographic Details
Published in:2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA) pp. 1 - 10
Main Authors: Xu, Ziqi, Lu, Yahui, Jiang, Yu
Format: Conference Proceeding
Language:English
Published: IEEE 13.10.2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Clustering is a task of unsupervised learning, aiming to group a set of data so that data in the same group are more similar to each other than to those in other groups. Affinity propagation (AP) is a clustering algorithm which finds the exemplars (representative points) for data points by spreading messages among them. AP algorithm has several drawbacks. First, it is time-consuming and memory-consuming for clustering on large-scale dataset, due to its N square time and space complexity. Second, AP may produce too many small clusters. Third, AP may have difficulty in converging which leads to a higher cost of time for fine turning. To achieve better effectiveness and efficiency, in this paper we propose Mini-Batch Affinity Propagation (MBAP). MBAP processes small batches of data serially and obtains clustering results gradually. We also proposes MBAP with early stopping (MBAP_ES), which integrates MBAP with stopping strategy so that it can stop clustering early when the model is nearly unchanged. The experiments show the effectiveness and efficiency of MBAP and MBAP_ES in comparison to other AP-based algorithms.
DOI:10.1109/DSAA54385.2022.10032450