Multi-resolution Hashing for Fast Pairwise Summations

A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings / annual Symposium on Foundations of Computer Science s. 769 - 792
Hlavní autori: Charikar, Moses, Siminelakis, Paris
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.11.2019
Predmet:
ISSN:2575-8454
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a pairwise function w(x,y), we study the problem of designing a data-structure that enables sub-linear time approximation of the summation of w(x,y) for all x in X for any query point y. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is constructing a collection of hash families, each inducing a different collision probability between points in the dataset, such that the pointwise supremum of the collision probabilities scales as the square root of the function w(x,y). This leads to a data-structure that approximates pairwise summations using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for log-convex functions of the inner product between two vectors. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density, Partition Function Estimation and sampling.
AbstractList A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a pairwise function w(x,y), we study the problem of designing a data-structure that enables sub-linear time approximation of the summation of w(x,y) for all x in X for any query point y. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is constructing a collection of hash families, each inducing a different collision probability between points in the dataset, such that the pointwise supremum of the collision probabilities scales as the square root of the function w(x,y). This leads to a data-structure that approximates pairwise summations using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for log-convex functions of the inner product between two vectors. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density, Partition Function Estimation and sampling.
Author Siminelakis, Paris
Charikar, Moses
Author_xml – sequence: 1
  givenname: Moses
  surname: Charikar
  fullname: Charikar, Moses
  organization: Stanford University
– sequence: 2
  givenname: Paris
  surname: Siminelakis
  fullname: Siminelakis, Paris
  organization: Stanford University
BookMark eNotjktLxDAURqMoOB1du3DTP9B687jpzVKKdYSREUbXQ5qmGulDmhbx3-ugq29zOOdL2NkwDp6xaw4552Buq125zwVwkwMA8hOW8EIQVwYFnrKVwAIzUqguWBLjB4ACBLVi-LR0c8gmH8dumcM4pBsb38PwlrbjlFY2zumzDdNXiD7dL31vj0y8ZOet7aK_-t81e63uX8pNtt09PJZ32ywIkHPWoPZAtdMFWgLF6feLFiSP9VZ553VjqAbijfPOgXQGa2ettsYVtZStXLObP2_w3h8-p9Db6ftARpEmJX8A9FtGdA
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/FOCS.2019.00051
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
Computer Science
EISBN 1728149525
9781728149523
EISSN 2575-8454
EndPage 792
ExternalDocumentID 8948684
Genre orig-research
GroupedDBID --Z
29O
6IE
6IH
6IK
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIE
RIO
ID FETCH-LOGICAL-i203t-d56e08bc675a8041852562830040f4ece6d98b081dcecc03c95bcaa6a9c7b33f3
IEDL.DBID RIE
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000510015300042&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:31:38 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-d56e08bc675a8041852562830040f4ece6d98b081dcecc03c95bcaa6a9c7b33f3
PageCount 24
ParticipantIDs ieee_primary_8948684
PublicationCentury 2000
PublicationDate 2019-Nov.
PublicationDateYYYYMMDD 2019-11-01
PublicationDate_xml – month: 11
  year: 2019
  text: 2019-Nov.
PublicationDecade 2010
PublicationTitle Proceedings / annual Symposium on Foundations of Computer Science
PublicationTitleAbbrev SFCS
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0040504
Score 2.1729894
Snippet A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an...
SourceID ieee
SourceType Publisher
StartPage 769
SubjectTerms Anomaly detection
Approximation algorithms
Computer science
Data structures
Harmonic analysis
Hashing
Importance Sampling
Kernel
Kernel Density
Partition Function Estimation
Partitioning algorithms
Sub linear algorithms
Title Multi-resolution Hashing for Fast Pairwise Summations
URI https://ieeexplore.ieee.org/document/8948684
WOSCitedRecordID wos000510015300042&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED2VigGWQlvEtzwwYprGjj_miqgDlEqA1K2ynYuUpUVNCn8fO0kLAwtblCXSO9v3cuf3DuCOWc1y9BGQymrKecypyqygTrBYOuQmE7VQ-EnOZmqx0PMO3O-1MIhYXz7Dh_BY9_KztduGUtlIaa6E4gdwIKVotFq7U9fzjoi31j3jSI_Sl8lruLgV3Cij0IX8NTulTh1p738fPYHhjwaPzPfZ5RQ6uOpDbzeEgbR7sg_Hz3vj1XIASS2opf4ful1SZNpMSyKenJLUlBWZm2LzVZRIQgGsqdcN4T19fJtMaTsZgRZxxCqaJQIjZZ1n-yYYCKnEM5dg5eVxyDk6FJlW1mf7zPkQRczpxDpjhNFOWsZydgbd1XqF50BQesYVh3ZfIrjlicm5csZ6zIxQLh5fwCBgsvxozC-WLRyXf7--gqMAeiPWu4ZutdniDRy6z6ooN7d1xL4BKG6XuA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED2VggQshbaIbzwwEurGH7HniqiItlSiSN0q23GkLC1qUvj72ElaGFjYLA-2dOfknu_83gHcEy1Jap0HIqFlQGlIA5FoHhhOwshYqhJeEoVH0WQi5nM5bcDDjgtjrS0fn9lHPyxr-cnKbHyqrCckFVzQPdhnblVcsbW2_12HPDCtxXv6WPbi18Gbf7rl9Sixr0P-6p5SBo-49b9tT6D7w8JD0118OYWGXbahtW3DgOqvsg3H4530at4BVlJqA3eLrg8VGlb9kpCDpyhWeYGmKlt_ZblFPgVWZey68B4_zQbDoO6NEGQhJkWQMG6x0MbhfeUlhARz2MWLeTk7pNQayxMptIv3iXFOwsRIpo1SXEkTaUJScgbN5WppzwHZyGGu0Bf8GKeaMpVSYZR2NlNcmLB_AR1vk8VHJX-xqM1x-ff0HRwOZ-PRYvQ8ebmCI--Airp3Dc1ivbE3cGA-iyxf35be-wY9ppr_
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%2F+annual+Symposium+on+Foundations+of+Computer+Science&rft.atitle=Multi-resolution+Hashing+for+Fast+Pairwise+Summations&rft.au=Charikar%2C+Moses&rft.au=Siminelakis%2C+Paris&rft.date=2019-11-01&rft.pub=IEEE&rft.eissn=2575-8454&rft.spage=769&rft.epage=792&rft_id=info:doi/10.1109%2FFOCS.2019.00051&rft.externalDocID=8948684