Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing

The single linkage method is a fundamental agglomerative hierarchical clustering algorithm. This algorithm regards each point as a single cluster initially. In the agglomeration step, it connects a pair of clusters such that the distance between the nearest members is the shortest. This step is repe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge and information systems Jg. 12; H. 1; S. 25 - 53
Hauptverfasser: Koga, Hisashi, Ishibashi, Tetsuo, Watanabe, Toshinori
Format: Journal Article
Sprache:Englisch
Veröffentlicht: London Springer Nature B.V 01.05.2007
Schlagworte:
ISSN:0219-1377, 0219-3116
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The single linkage method is a fundamental agglomerative hierarchical clustering algorithm. This algorithm regards each point as a single cluster initially. In the agglomeration step, it connects a pair of clusters such that the distance between the nearest members is the shortest. This step is repeated until only one cluster remains. The single linkage method can efficiently detect clusters in arbitrary shapes. However, a drawback of this method is a large time complexity of O(n2), where n represents the number of data points. This time complexity makes this method infeasible for large data. This paper proposes a fast approximation algorithm for the single linkage method. Our algorithm reduces the time complexity to O(nB) by rapidly finding the near clusters to be connected by Locality-Sensitive Hashing, a fast algorithm for the approximate nearest neighbor search. Here, B represents the maximum number of points going into a single hash entry and it practically diminishes to a small constant as compared to n for sufficiently large hash tables. Experimentally, we show that (1) the proposed algorithm obtains clustering results similar to those obtained by the single linkage method and (2) it runs faster for large data than the single linkage method.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0219-1377
0219-3116
DOI:10.1007/s10115-006-0027-5