Efficiently Summarizing Data Streams over Sliding Windows
Estimating the frequency of any piece of information in large-scale distributed data streams became of utmost importance in the last decade (e.g., in the context of network monitoring, big data, etc.). If some elegant solutions have been proposed recently, their approximation is computed from the in...
Gespeichert in:
| Veröffentlicht in: | 2015 IEEE 14th International Symposium on Network Computing and Applications S. 151 - 158 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.09.2015
|
| Schlagworte: | |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Estimating the frequency of any piece of information in large-scale distributed data streams became of utmost importance in the last decade (e.g., in the context of network monitoring, big data, etc.). If some elegant solutions have been proposed recently, their approximation is computed from the inception of the stream. In a runtime distributed context, one would prefer to gather information only about the recent past. This may be led by the need to save resources or by the fact that recent information is more relevant. In this paper, we consider the sliding window model and propose two different (on-line) algorithms that approximate the items frequency in the active window. More precisely, we determine a (ε, δ)-additive-approximation meaning that the error is greater than ε only with probability δ. These solutions use a very small amount of memory with respect to the size N of the window and the number n of distinct items of the stream, namely, O(1/ε log 1/δ (log N+log n)) and O(1/τε log 1/δ (log N+log n)) bits of space, where τ is a parameter limiting memory usage. We also provide their distributed variant, i.e., considering the sliding window functional monitoring model. We compared the proposed algorithms to each other and also to the state of the art through extensive experiments on synthetic traces and real data sets that validate the robustness and accuracy of our algorithms. |
|---|---|
| AbstractList | Estimating the frequency of any piece of information in large-scale distributed data streams became of utmost importance in the last decade (e.g., in the context of network monitoring, big data, etc.). If some elegant solutions have been proposed recently, their approximation is computed from the inception of the stream. In a runtime distributed context, one would prefer to gather information only about the recent past. This may be led by the need to save resources or by the fact that recent information is more relevant. In this paper, we consider the sliding window model and propose two different (on-line) algorithms that approximate the items frequency in the active window. More precisely, we determine a (ε, δ)-additive-approximation meaning that the error is greater than ε only with probability δ. These solutions use a very small amount of memory with respect to the size N of the window and the number n of distinct items of the stream, namely, O(1/ε log 1/δ (log N+log n)) and O(1/τε log 1/δ (log N+log n)) bits of space, where τ is a parameter limiting memory usage. We also provide their distributed variant, i.e., considering the sliding window functional monitoring model. We compared the proposed algorithms to each other and also to the state of the art through extensive experiments on synthetic traces and real data sets that validate the robustness and accuracy of our algorithms. |
| Author | Busnel, Yann Rivetti, Nicolo Mostefaoui, Achour |
| Author_xml | – sequence: 1 givenname: Nicolo surname: Rivetti fullname: Rivetti, Nicolo email: Nicolo.Rivetti@univ-nantes.fr organization: LINA, Univ. de Nantes, Nantes, France – sequence: 2 givenname: Yann surname: Busnel fullname: Busnel, Yann email: Yann.Busnel@ensai.fr organization: Crest, Inria, Rennes, France – sequence: 3 givenname: Achour surname: Mostefaoui fullname: Mostefaoui, Achour email: Achour.Mostefaoui@univ-nantes.fr organization: LINA, Univ. de Nantes, Nantes, France |
| BookMark | eNotjktLw0AYAFdQ0NaePHrZP5C4j-zjO5ZYq1D0EMVj2acuJBvJRqX-ehU9zWFgmAU6zmMOCF1QUlNK4Oq-XdeMUFE38ggtqCBAqG6AnaJVKckSJpVsNIEzBJsYk0shz_0Bd-_DYKb0lfILvjazwd08BTMUPH6ECXd98r_mOWU_fpZzdBJNX8Lqn0v0dLN5bG-r3cP2rl3vqlemYK6sIx6MtZoxLzyXLHqtrHdagHHeBAVMaUqoFY45Ja2MjBsLUWirgTvHl-jyr5tCCPu3Kf0sHvaKK6qo5t99BEax |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/NCA.2015.46 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 1509018492 9781509018499 1509018484 9781509018482 |
| EndPage | 158 |
| ExternalDocumentID | 7371718 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ALMA_UNASSIGNED_HOLDINGS CBEJK RIB RIC RIE RIL |
| ID | FETCH-LOGICAL-h279t-bc0d9abb822d5d362fd87bdc859acdae79278101b5c2c76b6f23ab9f58b893cc3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 13 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000380522500024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Dec 20 05:18:32 EST 2023 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-h279t-bc0d9abb822d5d362fd87bdc859acdae79278101b5c2c76b6f23ab9f58b893cc3 |
| OpenAccessLink | https://hal.science/hal-01535693 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_7371718 |
| PublicationCentury | 2000 |
| PublicationDate | 20150901 |
| PublicationDateYYYYMMDD | 2015-09-01 |
| PublicationDate_xml | – month: 09 year: 2015 text: 20150901 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | 2015 IEEE 14th International Symposium on Network Computing and Applications |
| PublicationTitleAbbrev | NCA |
| PublicationYear | 2015 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib026764809 |
| Score | 1.689946 |
| Snippet | Estimating the frequency of any piece of information in large-scale distributed data streams became of utmost importance in the last decade (e.g., in the... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 151 |
| SubjectTerms | Approximation methods Complexity theory Computational modeling Data models data stream Estimation Frequency estimation Monitoring randomized approximation algorithm windowing model |
| Title | Efficiently Summarizing Data Streams over Sliding Windows |
| URI | https://ieeexplore.ieee.org/document/7371718 |
| WOSCitedRecordID | wos000380522500024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwMhEJ60jQdPalrjOxw8SrvLPoCjqW08NU2qsbcGBjZuUlvT3Wr01wu0Vg9evBHIhDBAvhmYbwbg2mJsBEZIbRIbmhaaUelwmDrbGXkuY8yNDsUm-GgkplM5bsDNjgtjrQ3BZ7brm-Ev3yxx7Z_KejxxzkcsmtDkPN9wtb7PDst5nopIbil4cSR7o_6tD93Kut64_VU6JSDH8OB_cx5C54eCR8Y7cDmChl20QQ5CwgcnMP8gk0A7Kz_dMLlTtSL-g1m9VMQHZZLJvPSC5Mn53Mv3qgOPw8FD_55uax_QZ8ZlTTVGRiqtHX6bzDiUKYzg2qDIpEKjLJeM-9xcOkPmlKrzgiVKyyIT2lkgiMkxtBbLhT0BUrhbLKTArHCmnvLN1KnKMhNFWnFuTqHtlz173aS3mG1XfPZ39znse6VuwqwuoFWv1vYS9vCtLqvVVdiTL_ZPjyQ |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4gmuhJDRjf9uDRwm6X3bZHgxCMuCEBIzfS10YSHgYWjf5627KiBy_emjaTptM230w73wzAtVGhZipQ2EShxo1MEswtDmNrOyua8FAlWvpiEzRN2XDIeyW42XBhjDE--MzUXNP_5eu5WrmnsjqNrPMRsi3YdpWzCrbW9-khCU0aLOAFCS8MeD1t3rrgrbjmzNtfxVM8drT3_zfrAVR_SHiot4GXQyiZWQV4y6d8sAKTD9T3xLPxpx1GdyIXyH0xi-kSubBM1J-MnSB6tl73_H1Zhad2a9Ds4KL6AX4hlOdYqkBzIaVFcB1rizOZZlRqxWIulBaGckJddi4ZK2LVKpOMRELyLGbS2iBKRUdQns1n5hhQZu8x40zFmTX2hGs2rKoM0UEgBaX6BCpu2aPXdYKLUbHi07-7r2C3M3jsjrr36cMZ7DkFr4OuzqGcL1bmAnbUWz5eLi79_nwBnQKSbQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+IEEE+14th+International+Symposium+on+Network+Computing+and+Applications&rft.atitle=Efficiently+Summarizing+Data+Streams+over+Sliding+Windows&rft.au=Rivetti%2C+Nicolo&rft.au=Busnel%2C+Yann&rft.au=Mostefaoui%2C+Achour&rft.date=2015-09-01&rft.pub=IEEE&rft.spage=151&rft.epage=158&rft_id=info:doi/10.1109%2FNCA.2015.46&rft.externalDocID=7371718 |