Multi-resolution Hashing for Fast Pairwise Summations
A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a...
Saved in:
| Published in: | Proceedings / annual Symposium on Foundations of Computer Science pp. 769 - 792 |
|---|---|
| Main Authors: | , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.11.2019
|
| Subjects: | |
| ISSN: | 2575-8454 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a pairwise function w(x,y), we study the problem of designing a data-structure that enables sub-linear time approximation of the summation of w(x,y) for all x in X for any query point y. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is constructing a collection of hash families, each inducing a different collision probability between points in the dataset, such that the pointwise supremum of the collision probabilities scales as the square root of the function w(x,y). This leads to a data-structure that approximates pairwise summations using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for log-convex functions of the inner product between two vectors. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density, Partition Function Estimation and sampling. |
|---|---|
| AbstractList | A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a pairwise function w(x,y), we study the problem of designing a data-structure that enables sub-linear time approximation of the summation of w(x,y) for all x in X for any query point y. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is constructing a collection of hash families, each inducing a different collision probability between points in the dataset, such that the pointwise supremum of the collision probabilities scales as the square root of the function w(x,y). This leads to a data-structure that approximates pairwise summations using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for log-convex functions of the inner product between two vectors. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density, Partition Function Estimation and sampling. |
| Author | Siminelakis, Paris Charikar, Moses |
| Author_xml | – sequence: 1 givenname: Moses surname: Charikar fullname: Charikar, Moses organization: Stanford University – sequence: 2 givenname: Paris surname: Siminelakis fullname: Siminelakis, Paris organization: Stanford University |
| BookMark | eNotjktLxDAURqMoOB1du3DTP9B687jpzVKKdYSREUbXQ5qmGulDmhbx3-ugq29zOOdL2NkwDp6xaw4552Buq125zwVwkwMA8hOW8EIQVwYFnrKVwAIzUqguWBLjB4ACBLVi-LR0c8gmH8dumcM4pBsb38PwlrbjlFY2zumzDdNXiD7dL31vj0y8ZOet7aK_-t81e63uX8pNtt09PJZ32ywIkHPWoPZAtdMFWgLF6feLFiSP9VZ553VjqAbijfPOgXQGa2ettsYVtZStXLObP2_w3h8-p9Db6ftARpEmJX8A9FtGdA |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/FOCS.2019.00051 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Mathematics Computer Science |
| EISBN | 1728149525 9781728149523 |
| EISSN | 2575-8454 |
| EndPage | 792 |
| ExternalDocumentID | 8948684 |
| Genre | orig-research |
| GroupedDBID | --Z 29O 6IE 6IH 6IK ALMA_UNASSIGNED_HOLDINGS CBEJK RIE RIO |
| ID | FETCH-LOGICAL-i203t-d56e08bc675a8041852562830040f4ece6d98b081dcecc03c95bcaa6a9c7b33f3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000510015300042&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:31:38 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-d56e08bc675a8041852562830040f4ece6d98b081dcecc03c95bcaa6a9c7b33f3 |
| PageCount | 24 |
| ParticipantIDs | ieee_primary_8948684 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-Nov. |
| PublicationDateYYYYMMDD | 2019-11-01 |
| PublicationDate_xml | – month: 11 year: 2019 text: 2019-Nov. |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings / annual Symposium on Foundations of Computer Science |
| PublicationTitleAbbrev | SFCS |
| PublicationYear | 2019 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0040504 |
| Score | 2.173093 |
| Snippet | A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 769 |
| SubjectTerms | Anomaly detection Approximation algorithms Computer science Data structures Harmonic analysis Hashing Importance Sampling Kernel Kernel Density Partition Function Estimation Partitioning algorithms Sub linear algorithms |
| Title | Multi-resolution Hashing for Fast Pairwise Summations |
| URI | https://ieeexplore.ieee.org/document/8948684 |
| WOSCitedRecordID | wos000510015300042&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQMshbaIb3lgJNStk9ieK6IulEiA1K2yz46UJUVNCn8fO0kDAwuLFXmJdI7z7s5-7wHco9EoHAwEDN3g1V8CxU3mah4tkBlh5hHWZhN8tRLrtUx78NBxYay19eUz--gf67N8s8W9b5VNhQxFLMI-9DmPG67W4a_r8g4attI9Myqnycvi1V_c8mqU1J9C_vJOqaEjGf7vpacw-eHgkbRDlzPo2WIEw4MJA2n35AhOnjvh1XIMUU2oDVwN3X5SZNm4JRGXnJJElRVJVb77yktLfAOs6ddN4D15elssg9YZIcjnlFWBiWJLhUaX7SsvICQil7l4KS8Xhyy0aGMjhXZob9AtEWUoI41KxUoi14xl7BwGxbawF0BiiXSezRxGaRtqM9OcaSGVkoqLLFLqEsY-JpuPRvxi04bj6u_pazj2QW_IejcwqHZ7ewtH-Fnl5e6uXrFv2ZeY2Q |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED2VggQshbaIbzwwEurW-bDniqiItlSiSN0q--xIWVqUpPD3sZO0MLCwRFaWSHd23vns9x7APWqF3MKAx9A-nPqLJyOd2D2P4sg014MAS7OJaDrli4WYNeBhx4UxxpSXz8yjG5Zn-XqNG9cq63Hh85D7e7DvnLNqttb2v2srD-rX4j19Knrx6_DNXd1yepTUnUP-ck8pwSNu_e-zJ9D9YeGR2Q5fTqFhVm1obW0YSL0q23A82Umv5h0ISkqtZ3fR9aQio8ovidjylMQyL8hMptlXmhviWmBVx64L7_HTfDjyam8ELx1QVng6CA3lCm29L52EEA9s7eLEvGwcEt-gCbXgyuK9RpskylAECqUMpcBIMZawM2iu1itzDiQUSAdJ36KUMr7SfRUxxYWUQkY8CaS8gI6LyfKjkr9Y1uG4_Pv1HRyO5pPxcvw8fbmCI5eAirp3Dc0i25gbOMDPIs2z2zJ73wA_nCI |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%2F+annual+Symposium+on+Foundations+of+Computer+Science&rft.atitle=Multi-resolution+Hashing+for+Fast+Pairwise+Summations&rft.au=Charikar%2C+Moses&rft.au=Siminelakis%2C+Paris&rft.date=2019-11-01&rft.pub=IEEE&rft.eissn=2575-8454&rft.spage=769&rft.epage=792&rft_id=info:doi/10.1109%2FFOCS.2019.00051&rft.externalDocID=8948684 |