Multi-resolution Hashing for Fast Pairwise Summations

A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings / annual Symposium on Foundations of Computer Science s. 769 - 792
Hlavní autoři:	Charikar, Moses, Siminelakis, Paris
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.11.2019
Témata:	Anomaly detection Approximation algorithms Computer science Data structures Harmonic analysis Hashing Importance Sampling Kernel Kernel Density Partition Function Estimation Partitioning algorithms Sub linear algorithms
ISSN:	2575-8454
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a pairwise function w(x,y), we study the problem of designing a data-structure that enables sub-linear time approximation of the summation of w(x,y) for all x in X for any query point y. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is constructing a collection of hash families, each inducing a different collision probability between points in the dataset, such that the pointwise supremum of the collision probabilities scales as the square root of the function w(x,y). This leads to a data-structure that approximates pairwise summations using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for log-convex functions of the inner product between two vectors. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density, Partition Function Estimation and sampling.
ISSN:	2575-8454
DOI:	10.1109/FOCS.2019.00051