Multi-resolution Hashing for Fast Pairwise Summations
A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a...
Uložené v:
| Vydané v: | Proceedings / annual Symposium on Foundations of Computer Science s. 769 - 792 |
|---|---|
| Hlavní autori: | , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.11.2019
|
| Predmet: | |
| ISSN: | 2575-8454 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a pairwise function w(x,y), we study the problem of designing a data-structure that enables sub-linear time approximation of the summation of w(x,y) for all x in X for any query point y. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is constructing a collection of hash families, each inducing a different collision probability between points in the dataset, such that the pointwise supremum of the collision probabilities scales as the square root of the function w(x,y). This leads to a data-structure that approximates pairwise summations using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for log-convex functions of the inner product between two vectors. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density, Partition Function Estimation and sampling. |
|---|---|
| AbstractList | A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a pairwise function w(x,y), we study the problem of designing a data-structure that enables sub-linear time approximation of the summation of w(x,y) for all x in X for any query point y. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is constructing a collection of hash families, each inducing a different collision probability between points in the dataset, such that the pointwise supremum of the collision probabilities scales as the square root of the function w(x,y). This leads to a data-structure that approximates pairwise summations using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for log-convex functions of the inner product between two vectors. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density, Partition Function Estimation and sampling. |
| Author | Siminelakis, Paris Charikar, Moses |
| Author_xml | – sequence: 1 givenname: Moses surname: Charikar fullname: Charikar, Moses organization: Stanford University – sequence: 2 givenname: Paris surname: Siminelakis fullname: Siminelakis, Paris organization: Stanford University |
| BookMark | eNotjktLxDAURqMoOB1du3DTP9B687jpzVKKdYSREUbXQ5qmGulDmhbx3-ugq29zOOdL2NkwDp6xaw4552Buq125zwVwkwMA8hOW8EIQVwYFnrKVwAIzUqguWBLjB4ACBLVi-LR0c8gmH8dumcM4pBsb38PwlrbjlFY2zumzDdNXiD7dL31vj0y8ZOet7aK_-t81e63uX8pNtt09PJZ32ywIkHPWoPZAtdMFWgLF6feLFiSP9VZ553VjqAbijfPOgXQGa2ettsYVtZStXLObP2_w3h8-p9Db6ftARpEmJX8A9FtGdA |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/FOCS.2019.00051 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Mathematics Computer Science |
| EISBN | 1728149525 9781728149523 |
| EISSN | 2575-8454 |
| EndPage | 792 |
| ExternalDocumentID | 8948684 |
| Genre | orig-research |
| GroupedDBID | --Z 29O 6IE 6IH 6IK ALMA_UNASSIGNED_HOLDINGS CBEJK RIE RIO |
| ID | FETCH-LOGICAL-i203t-d56e08bc675a8041852562830040f4ece6d98b081dcecc03c95bcaa6a9c7b33f3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000510015300042&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:31:38 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-d56e08bc675a8041852562830040f4ece6d98b081dcecc03c95bcaa6a9c7b33f3 |
| PageCount | 24 |
| ParticipantIDs | ieee_primary_8948684 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-Nov. |
| PublicationDateYYYYMMDD | 2019-11-01 |
| PublicationDate_xml | – month: 11 year: 2019 text: 2019-Nov. |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings / annual Symposium on Foundations of Computer Science |
| PublicationTitleAbbrev | SFCS |
| PublicationYear | 2019 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0040504 |
| Score | 2.1729894 |
| Snippet | A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 769 |
| SubjectTerms | Anomaly detection Approximation algorithms Computer science Data structures Harmonic analysis Hashing Importance Sampling Kernel Kernel Density Partition Function Estimation Partitioning algorithms Sub linear algorithms |
| Title | Multi-resolution Hashing for Fast Pairwise Summations |
| URI | https://ieeexplore.ieee.org/document/8948684 |
| WOSCitedRecordID | wos000510015300042&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED2VigGWQlvEtzwwYprGjj_miqgDlEqA1K2ynYuUpUVNCn8fO0kLAwtblCXSO9v3cuf3DuCOWc1y9BGQymrKecypyqygTrBYOuQmE7VQ-EnOZmqx0PMO3O-1MIhYXz7Dh_BY9_KztduGUtlIaa6E4gdwIKVotFq7U9fzjoi31j3jSI_Sl8lruLgV3Cij0IX8NTulTh1p738fPYHhjwaPzPfZ5RQ6uOpDbzeEgbR7sg_Hz3vj1XIASS2opf4ful1SZNpMSyKenJLUlBWZm2LzVZRIQgGsqdcN4T19fJtMaTsZgRZxxCqaJQIjZZ1n-yYYCKnEM5dg5eVxyDk6FJlW1mf7zPkQRczpxDpjhNFOWsZydgbd1XqF50BQesYVh3ZfIrjlicm5csZ6zIxQLh5fwCBgsvxozC-WLRyXf7--gqMAeiPWu4ZutdniDRy6z6ooN7d1xL4BKG6XuA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED2VggQshbaIbzwwEurGH7HniqiItlSiSN0q23GkLC1qUvj72ElaGFjYLA-2dOfknu_83gHcEy1Jap0HIqFlQGlIA5FoHhhOwshYqhJeEoVH0WQi5nM5bcDDjgtjrS0fn9lHPyxr-cnKbHyqrCckFVzQPdhnblVcsbW2_12HPDCtxXv6WPbi18Gbf7rl9Sixr0P-6p5SBo-49b9tT6D7w8JD0118OYWGXbahtW3DgOqvsg3H4530at4BVlJqA3eLrg8VGlb9kpCDpyhWeYGmKlt_ZblFPgVWZey68B4_zQbDoO6NEGQhJkWQMG6x0MbhfeUlhARz2MWLeTk7pNQayxMptIv3iXFOwsRIpo1SXEkTaUJScgbN5WppzwHZyGGu0Bf8GKeaMpVSYZR2NlNcmLB_AR1vk8VHJX-xqM1x-ff0HRwOZ-PRYvQ8ebmCI--Airp3Dc1ivbE3cGA-iyxf35be-wY9ppr_ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%2F+annual+Symposium+on+Foundations+of+Computer+Science&rft.atitle=Multi-resolution+Hashing+for+Fast+Pairwise+Summations&rft.au=Charikar%2C+Moses&rft.au=Siminelakis%2C+Paris&rft.date=2019-11-01&rft.pub=IEEE&rft.eissn=2575-8454&rft.spage=769&rft.epage=792&rft_id=info:doi/10.1109%2FFOCS.2019.00051&rft.externalDocID=8948684 |