An Information Divergence Estimation over Data Streams

In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the a...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2012 IEEE 11th International Symposium on Network Computing and Applications s. 28 - 35
Hlavní autoři: Anceaume, E., Busnel, Y.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.08.2012
Témata:
ISBN:9781467322140, 1467322148
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we have proposed in a prior work, AnKLe, a one pass algorithm for estimating the Kullback-Leibler divergence of an observed stream compared to the expected one. Experimental evaluations have shown that the estimation provided by AnKLe is accurate for different adversarial settings for which the quality of other methods dramatically decreases. In the present paper, considering n as the number of distinct data items in a stream, we show that AnKLe is an (ε, δ)-approximation algorithm with a space complexity Õ(1/ε + 1/ε 2 ) bits in "most" cases, and Õ(1/ε + n-ε -1 /ε 2 ) otherwise. To the best of our knowledge, an approximation algorithm for estimating the Kullback-Leibler divergence has never been analyzed before.
ISBN:9781467322140
1467322148
DOI:10.1109/NCA.2012.16