Exact memory–constrained UPGMA for large scale speaker clustering

•We focus on exact hierarchical clustering of large sets of utterances.•Hierarchical clustering is challenging due to memory constraints.•We propose an efficient, exact and parallel implementation of UPGMA clustering.•We extend the Clustering Features concept to speaker recognition scoring functions...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition Jg. 95; S. 235 - 246
Hauptverfasser: Cumani, Sandro, Laface, Pietro
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 01.11.2019
Schlagworte:
ISSN:0031-3203, 1873-5142
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We focus on exact hierarchical clustering of large sets of utterances.•Hierarchical clustering is challenging due to memory constraints.•We propose an efficient, exact and parallel implementation of UPGMA clustering.•We extend the Clustering Features concept to speaker recognition scoring functions.•We assess the efficiency of our method on datasets including 4 million utterances. This work focuses on clustering large sets of utterances collected from an unknown number of speakers. Since the number of speakers is unknown, we focus on exact hierarchical agglomerative clustering, followed by automatic selection of the number of clusters. Exact hierarchical clustering of a large number of vectors, however, is a challenging task due to memory constraints, which make it ineffective or unfeasible for large datasets. We propose an exact memory–constrained and parallel implementation of average linkage clustering for large scale datasets, showing that its computational complexity is approximately O(N2), but is much faster (up to 40 times in our experiments), than the Reciprocal Nearest Neighbor chain algorithm, which has O(N2) complexity. We also propose a very fast silhouette computation procedure that, in linear time, determines the set of clusters. The computational efficiency of our approach is demonstrated on datasets including up to 4 million speaker vectors.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2019.06.018