Approximately Detecting Duplicates for Probabilistic Data Streams over Sliding Windows

A probabilistic data stream S is defined as a sequence of uncertain tuples <;t i , p i >;, i = 1...∞, with the semantics that element t i occurs in the stream with probability p i ϵ (0, 1). Thus each distinct element t, which occurs in tuples of S, has an existential probability based on the t...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2010 3rd International Symposium on Parallel Architectures, Algorithms and Programming s. 263 - 268
Hlavní autoři: Wang, Xiujun, Shen, Hong
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.12.2010
Témata:
ISBN:1424494826, 9781424494828
ISSN:2168-3034
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:A probabilistic data stream S is defined as a sequence of uncertain tuples <;t i , p i >;, i = 1...∞, with the semantics that element t i occurs in the stream with probability p i ϵ (0, 1). Thus each distinct element t, which occurs in tuples of S, has an existential probability based on the tuples: <; t i = t, p i >; ϵ S. Existing duplicate detection methods for a traditional deterministic data stream can't maintain these existential probabilities for elements in S, which is important query information. In this paper, we present a novel data structure, Floating Counter Bloom Filter (FCBF), as an extension of CBF, which can maintain these existential probabilities effectively. Based on FCBF, we present an efficient algorithm to approximately detect duplicates for probabilistic data streams over sliding windows. Given a sliding window size W and floating counter number N, for any t which occurs in the past sliding window, our method outputs the accurate existential probability of t with probability 1-(1/2) ln(2)*N/W . Our experimental results on the synthetic data verify the effectiveness of our approach.
ISBN:1424494826
9781424494828
ISSN:2168-3034
DOI:10.1109/PAAP.2010.16