Scalable Nearest Neighbor Algorithms for High Dimensional Data

For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent t...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on pattern analysis and machine intelligence Ročník 36; číslo 11; s. 2227 - 2240
Hlavní autoři:	Muja, Marius, Lowe, David G.
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	United States IEEE 01.11.2014 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Algorithms Approximation algorithms Approximation methods Clustering algorithms Computer vision Libraries Machine learning Machine learning algorithms Matching Partitioning algorithms Searching Training Trees Vegetation Nearest neighbor search algorithm configuration approximate search big data
ISSN:	0162-8828, 1939-3539, 2160-9292, 1939-3539
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0162-8828 1939-3539 2160-9292 1939-3539
DOI:	10.1109/TPAMI.2014.2321376