Sparsifying Count Sketch
The seminal work of Charikar et al. [1] called Count-Sketch suggests a sketching algorithm for real-valued vectors that has been used in frequency estimation for data streams and pairwise inner product estimation for real-valued vectors etc. One of the major advantages of Count-Sketch over other sim...
Uložené v:
| Vydané v: | Information processing letters Ročník 186; s. 106490 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier B.V
01.08.2024
|
| Predmet: | |
| ISSN: | 0020-0190, 1872-6119 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | The seminal work of Charikar et al. [1] called Count-Sketch suggests a sketching algorithm for real-valued vectors that has been used in frequency estimation for data streams and pairwise inner product estimation for real-valued vectors etc. One of the major advantages of Count-Sketch over other similar sketching algorithms, such as random projection, is that its running time, as well as the sparsity of sketch, depends on the sparsity of the input. Therefore, sparse datasets enjoy space-efficient (sparse sketches) and faster running time. However, on dense datasets, these advantages of Count-Sketch might be negligible over other baselines. In this work, we address this challenge by suggesting a simple and effective approach that outputs (asymptotically) a sparser sketch than that obtained via Count-Sketch, and as a by-product, we also achieve a faster running time. Simultaneously, the quality of our estimate is closely approximate to that of Count-Sketch. For frequency estimation and pairwise inner product estimation problems, our proposal Sparse-Count-Sketch provides unbiased estimates. These estimations, however, have slightly higher variances than their respective estimates obtained via Count-Sketch. To address this issue, we present improved estimators for these problems based on maximum likelihood estimation (MLE) that offer smaller variances even w.r.t.Count-Sketch. We suggest a rigorous theoretical analysis of our proposal for frequency estimation for data streams and pairwise inner product estimation for real-valued vectors.
•In this work we propose a sketching algorithm called Sparse Count Sketch (SCS).•It efficiently estimates the inner-product and frequency of streaming datasets.•It offers a sparser sketch along with faster running time than vanilla count sketch.•To reduce the variance of SCS we propose another estimator based on MLE.•Our work is backed with rigorous theoretical analysis. |
|---|---|
| ISSN: | 0020-0190 1872-6119 |
| DOI: | 10.1016/j.ipl.2024.106490 |