Efficiently Summarizing Data Streams over Sliding Windows

Estimating the frequency of any piece of information in large-scale distributed data streams became of utmost importance in the last decade (e.g., in the context of network monitoring, big data, etc.). If some elegant solutions have been proposed recently, their approximation is computed from the in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2015 IEEE 14th International Symposium on Network Computing and Applications S. 151 - 158
Hauptverfasser: Rivetti, Nicolo, Busnel, Yann, Mostefaoui, Achour
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.09.2015
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Estimating the frequency of any piece of information in large-scale distributed data streams became of utmost importance in the last decade (e.g., in the context of network monitoring, big data, etc.). If some elegant solutions have been proposed recently, their approximation is computed from the inception of the stream. In a runtime distributed context, one would prefer to gather information only about the recent past. This may be led by the need to save resources or by the fact that recent information is more relevant. In this paper, we consider the sliding window model and propose two different (on-line) algorithms that approximate the items frequency in the active window. More precisely, we determine a (ε, δ)-additive-approximation meaning that the error is greater than ε only with probability δ. These solutions use a very small amount of memory with respect to the size N of the window and the number n of distinct items of the stream, namely, O(1/ε log 1/δ (log N+log n)) and O(1/τε log 1/δ (log N+log n)) bits of space, where τ is a parameter limiting memory usage. We also provide their distributed variant, i.e., considering the sliding window functional monitoring model. We compared the proposed algorithms to each other and also to the state of the art through extensive experiments on synthetic traces and real data sets that validate the robustness and accuracy of our algorithms.
AbstractList Estimating the frequency of any piece of information in large-scale distributed data streams became of utmost importance in the last decade (e.g., in the context of network monitoring, big data, etc.). If some elegant solutions have been proposed recently, their approximation is computed from the inception of the stream. In a runtime distributed context, one would prefer to gather information only about the recent past. This may be led by the need to save resources or by the fact that recent information is more relevant. In this paper, we consider the sliding window model and propose two different (on-line) algorithms that approximate the items frequency in the active window. More precisely, we determine a (ε, δ)-additive-approximation meaning that the error is greater than ε only with probability δ. These solutions use a very small amount of memory with respect to the size N of the window and the number n of distinct items of the stream, namely, O(1/ε log 1/δ (log N+log n)) and O(1/τε log 1/δ (log N+log n)) bits of space, where τ is a parameter limiting memory usage. We also provide their distributed variant, i.e., considering the sliding window functional monitoring model. We compared the proposed algorithms to each other and also to the state of the art through extensive experiments on synthetic traces and real data sets that validate the robustness and accuracy of our algorithms.
Author Busnel, Yann
Rivetti, Nicolo
Mostefaoui, Achour
Author_xml – sequence: 1
  givenname: Nicolo
  surname: Rivetti
  fullname: Rivetti, Nicolo
  email: Nicolo.Rivetti@univ-nantes.fr
  organization: LINA, Univ. de Nantes, Nantes, France
– sequence: 2
  givenname: Yann
  surname: Busnel
  fullname: Busnel, Yann
  email: Yann.Busnel@ensai.fr
  organization: Crest, Inria, Rennes, France
– sequence: 3
  givenname: Achour
  surname: Mostefaoui
  fullname: Mostefaoui, Achour
  email: Achour.Mostefaoui@univ-nantes.fr
  organization: LINA, Univ. de Nantes, Nantes, France
BookMark eNotjktLw0AYAFdQ0NaePHrZP5C4j-zjO5ZYq1D0EMVj2acuJBvJRqX-ehU9zWFgmAU6zmMOCF1QUlNK4Oq-XdeMUFE38ggtqCBAqG6AnaJVKckSJpVsNIEzBJsYk0shz_0Bd-_DYKb0lfILvjazwd08BTMUPH6ECXd98r_mOWU_fpZzdBJNX8Lqn0v0dLN5bG-r3cP2rl3vqlemYK6sIx6MtZoxLzyXLHqtrHdagHHeBAVMaUqoFY45Ja2MjBsLUWirgTvHl-jyr5tCCPu3Kf0sHvaKK6qo5t99BEax
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/NCA.2015.46
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1509018492
9781509018499
1509018484
9781509018482
EndPage 158
ExternalDocumentID 7371718
Genre orig-research
GroupedDBID 6IE
6IL
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIB
RIC
RIE
RIL
ID FETCH-LOGICAL-h279t-bc0d9abb822d5d362fd87bdc859acdae79278101b5c2c76b6f23ab9f58b893cc3
IEDL.DBID RIE
ISICitedReferencesCount 13
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000380522500024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Dec 20 05:18:32 EST 2023
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-h279t-bc0d9abb822d5d362fd87bdc859acdae79278101b5c2c76b6f23ab9f58b893cc3
OpenAccessLink https://hal.science/hal-01535693
PageCount 8
ParticipantIDs ieee_primary_7371718
PublicationCentury 2000
PublicationDate 20150901
PublicationDateYYYYMMDD 2015-09-01
PublicationDate_xml – month: 09
  year: 2015
  text: 20150901
  day: 01
PublicationDecade 2010
PublicationTitle 2015 IEEE 14th International Symposium on Network Computing and Applications
PublicationTitleAbbrev NCA
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib026764809
Score 1.689946
Snippet Estimating the frequency of any piece of information in large-scale distributed data streams became of utmost importance in the last decade (e.g., in the...
SourceID ieee
SourceType Publisher
StartPage 151
SubjectTerms Approximation methods
Complexity theory
Computational modeling
Data models
data stream
Estimation
Frequency estimation
Monitoring
randomized approximation algorithm
windowing model
Title Efficiently Summarizing Data Streams over Sliding Windows
URI https://ieeexplore.ieee.org/document/7371718
WOSCitedRecordID wos000380522500024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwMhEJ60jQdPalrjOxw8SrvLPoCjqW08NU2qsbcGBjZuUlvT3Wr01wu0Vg9evBHIhDBAvhmYbwbg2mJsBEZIbRIbmhaaUelwmDrbGXkuY8yNDsUm-GgkplM5bsDNjgtjrQ3BZ7brm-Ev3yxx7Z_KejxxzkcsmtDkPN9wtb7PDst5nopIbil4cSR7o_6tD93Kut64_VU6JSDH8OB_cx5C54eCR8Y7cDmChl20QQ5CwgcnMP8gk0A7Kz_dMLlTtSL-g1m9VMQHZZLJvPSC5Mn53Mv3qgOPw8FD_55uax_QZ8ZlTTVGRiqtHX6bzDiUKYzg2qDIpEKjLJeM-9xcOkPmlKrzgiVKyyIT2lkgiMkxtBbLhT0BUrhbLKTArHCmnvLN1KnKMhNFWnFuTqHtlz173aS3mG1XfPZ39znse6VuwqwuoFWv1vYS9vCtLqvVVdiTL_ZPjyQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4gmuhJDRjf9uDRwm6X3bZHgxCMuCEBIzfS10YSHgYWjf5627KiBy_emjaTptM230w73wzAtVGhZipQ2EShxo1MEswtDmNrOyua8FAlWvpiEzRN2XDIeyW42XBhjDE--MzUXNP_5eu5WrmnsjqNrPMRsi3YdpWzCrbW9-khCU0aLOAFCS8MeD1t3rrgrbjmzNtfxVM8drT3_zfrAVR_SHiot4GXQyiZWQV4y6d8sAKTD9T3xLPxpx1GdyIXyH0xi-kSubBM1J-MnSB6tl73_H1Zhad2a9Ds4KL6AX4hlOdYqkBzIaVFcB1rizOZZlRqxWIulBaGckJddi4ZK2LVKpOMRELyLGbS2iBKRUdQns1n5hhQZu8x40zFmTX2hGs2rKoM0UEgBaX6BCpu2aPXdYKLUbHi07-7r2C3M3jsjrr36cMZ7DkFr4OuzqGcL1bmAnbUWz5eLi79_nwBnQKSbQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+IEEE+14th+International+Symposium+on+Network+Computing+and+Applications&rft.atitle=Efficiently+Summarizing+Data+Streams+over+Sliding+Windows&rft.au=Rivetti%2C+Nicolo&rft.au=Busnel%2C+Yann&rft.au=Mostefaoui%2C+Achour&rft.date=2015-09-01&rft.pub=IEEE&rft.spage=151&rft.epage=158&rft_id=info:doi/10.1109%2FNCA.2015.46&rft.externalDocID=7371718