Distributed Sparse Random Projection Trees for Constructing K-Nearest Neighbor Graphs

A random projection tree that partitions data points by projecting them onto random vectors is widely used for approximate nearest neighbor search in high-dimensional space. We consider a particular case of random projection trees for constructing a k-nearest neighbor graph (KNNG) from high-dimensio...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings - IEEE International Parallel and Distributed Processing Symposium pp. 36 - 46
Main Authors: Ranawaka, Isuru, Rahman, Md Khaledur, Azad, Ariful
Format: Conference Proceeding
Language:English
Published: IEEE 01.05.2023
Subjects:
ISSN:1530-2075
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A random projection tree that partitions data points by projecting them onto random vectors is widely used for approximate nearest neighbor search in high-dimensional space. We consider a particular case of random projection trees for constructing a k-nearest neighbor graph (KNNG) from high-dimensional data. We develop a distributed-memory Random Projection Tree (DRPT) algorithm for constructing sparse random projection trees and then running a query on the forest to create the KNN graph. DRPT uses sparse matrix operations and a communication reduction scheme to scale KNN graph constructions to thousands of processes on a supercomputer. The accuracy of DRPT is comparable to state-of-the-art methods for approximate nearest neighbor search, while it runs two orders of magnitude faster than its peers. DRPT is available at https://github.com/HipGraph/DRPT.
ISSN:1530-2075
DOI:10.1109/IPDPS54959.2023.00014