A Distributed Information Divergence Estimation over Data Streams
In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the a...
Saved in:
| Published in: | IEEE transactions on parallel and distributed systems Vol. 25; no. 2; pp. 478 - 487 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
IEEE
01.02.2014
Institute of Electrical and Electronics Engineers |
| Subjects: | |
| ISSN: | 1045-9219 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we propose a novel algorithm AnKLe for estimating the Kullback-Leibler divergence of an observed stream compared with the expected one. AnKLe combines sampling techniques and information-theoretic methods. It is very efficient, both in terms of space and time complexities, and requires only a single pass over the data stream. We show that AnKLe is an (ε, δ)-approximation algorithm with a space complexity Õ(1/ε + 1/ε 2 ) bits in "most" cases, and Õ(1/ε + (n-ε -1 )/ε 2 ) otherwise, where n is the number of distinct data items in a stream. Moreover, we propose a distributed version of AnKLe that requires at most O (rℓ (log n + 1)) bits of communication between the ℓ participating nodes, where r is number of rounds of the algorithm. Experimental results show that the estimation provided by AnKLe remains accurate even for different adversarial settings for which the quality of other methods dramatically decreases. |
|---|---|
| AbstractList | In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we propose a novel algorithm AnKLe for estimating the Kullback-Leibler divergence of an observed stream compared with the expected one. AnKLe combines sampling techniques and information-theoretic methods. It is very efficient, both in terms of space and time complexities, and requires only a single pass over the data stream. We show that AnKLe is an (ε, δ)-approximation algorithm with a space complexity Õ(1/ε + 1/ε^2) bits in "most" cases, and Õ(1/ε + (n−ε−1)/ε^2) otherwise, where n is the number of distinct data items in a stream. Moreover, we propose a distributed version of AnKLe that requires at most O (rl (log n + 1)) bits of communication between the l participating nodes, where r is number of rounds of the algorithm. Experimental results show that the estimation provided by AnKLe remains accurate even for different adversarial settings for which the quality of other methods dramatically decreases. In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we propose a novel algorithm AnKLe for estimating the Kullback-Leibler divergence of an observed stream compared with the expected one. AnKLe combines sampling techniques and information-theoretic methods. It is very efficient, both in terms of space and time complexities, and requires only a single pass over the data stream. We show that AnKLe is an (ε, δ)-approximation algorithm with a space complexity Õ(1/ε + 1/ε 2 ) bits in "most" cases, and Õ(1/ε + (n-ε -1 )/ε 2 ) otherwise, where n is the number of distinct data items in a stream. Moreover, we propose a distributed version of AnKLe that requires at most O (rℓ (log n + 1)) bits of communication between the ℓ participating nodes, where r is number of rounds of the algorithm. Experimental results show that the estimation provided by AnKLe remains accurate even for different adversarial settings for which the quality of other methods dramatically decreases. |
| Author | Busnel, Yann Anceaume, Emmanuelle |
| Author_xml | – sequence: 1 givenname: Emmanuelle surname: Anceaume fullname: Anceaume, Emmanuelle email: Emmanuelle.Anceaume@irisa.fr organization: IRISA, Rennes, France – sequence: 2 givenname: Yann surname: Busnel fullname: Busnel, Yann email: Yann.Busnel@univ-nantes.fr organization: Dept. of Comput. Sci., Univ. de Nantes, Nantes, France |
| BackLink | https://hal.science/hal-00998708$$DView record in HAL |
| BookMark | eNp1kEFPAjEQRnvARFCPnrzs1cPiDO1CeySAQkKiCXhuhtJqDeyatpL47-0KejAxaTLJ65t25uuxTt3UlrFrhD4iqLv103TVHwDyPgJ2WBdBVKUaoDpnvRjfAFBUILpsPC6mPqbgNx_JbotF7Zqwp-SbOvODDS-2NraYxeRPtMmwmFKiYpWCpX28ZGeOdtFeneoFe76frSfzcvn4sJiMl6XhHFKJXDki4SpBaJ2iylRSKCGB8jGCFGyNzBfOuCFKyXFgpdiYDRrDkYTlF-z2-O4r7fR7yPOET92Q1_PxUrcMQCk5AnnA7PKja0ITY7BOG5--50-B_E4j6DYl3aak25QyabvKP10_3_zn3xx9b639dYd5q2o44l84a3Wd |
| CODEN | ITDSEO |
| CitedBy_id | crossref_primary_10_1002_widm_1247 crossref_primary_10_1016_j_jss_2017_03_057 crossref_primary_10_1016_j_eswa_2015_07_027 |
| Cites_doi | 10.1561/9781933019604 10.1111/j.2517-6161.1966.tb00626.x 10.1145/2213556.2213597 10.1007/3-540-45749-6_33 10.1145/2031792.2031795 10.1145/1541880.1541882 10.1145/378580.378687 10.1109/EDCC.2012.9 10.1007/3-540-45726-7_1 10.1002/047174882x 10.1007/978-3-642-17653-1_5 10.1016/0022-0000(85)90041-8 10.1007/978-3-642-02927-1_10 10.1016/S0304-3975(03)00400-6 10.1145/1080173.1080176 10.1145/1140277.1140295 10.1007/s10994-008-5054-x 10.1145/237814.237823 10.1145/2213556.2213596 10.1145/1921659.1921667 10.1145/1807085.1807094 10.1109/NCA.2012.16 10.1145/1090191.1080118 10.1016/0167-6423(82)90012-0 |
| ContentType | Journal Article |
| Copyright | Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| DBID | 97E RIA RIE AAYXX CITATION 1XC VOOES |
| DOI | 10.1109/TPDS.2013.101 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EndPage | 487 |
| ExternalDocumentID | oai:HAL:hal-00998708v1 10_1109_TPDS_2013_101 6494567 |
| Genre | orig-research |
| GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNI RNS RZB TN5 TWZ UHB VH1 AAYXX CITATION 1XC VOOES |
| ID | FETCH-LOGICAL-c330t-139faa4f54a1ef9a5c5849480a80ac4a90dc8ef9fcf6188312e84bcb1cc31a4e3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 10 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000329051500020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1045-9219 |
| IngestDate | Sat Nov 29 15:07:27 EST 2025 Tue Nov 18 22:17:44 EST 2025 Sat Nov 29 08:09:26 EST 2025 Wed Aug 27 02:52:20 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Keywords | Randomized approximation algorithm Data stream Divergence |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c330t-139faa4f54a1ef9a5c5849480a80ac4a90dc8ef9fcf6188312e84bcb1cc31a4e3 |
| ORCID | 0000-0003-4158-149X 0000-0001-6908-719X |
| OpenAccessLink | https://hal.science/hal-00998708 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_6494567 crossref_citationtrail_10_1109_TPDS_2013_101 crossref_primary_10_1109_TPDS_2013_101 hal_primary_oai_HAL_hal_00998708v1 |
| PublicationCentury | 2000 |
| PublicationDate | 2014-Feb. 2014-2-00 2014-02 |
| PublicationDateYYYYMMDD | 2014-02-01 |
| PublicationDate_xml | – month: 02 year: 2014 text: 2014-Feb. |
| PublicationDecade | 2010 |
| PublicationTitle | IEEE transactions on parallel and distributed systems |
| PublicationTitleAbbrev | TPDS |
| PublicationYear | 2014 |
| Publisher | IEEE Institute of Electrical and Electronics Engineers |
| Publisher_xml | – name: IEEE – name: Institute of Electrical and Electronics Engineers |
| References | ref13 ref12 ref15 ref14 ref10 (ref30) 2008 ref1 ref17 ref16 ref19 ref18 ref23 ref25 ref20 Subhabrata (ref2) ref22 ref21 Kullback (ref24) 1951; 22 ref28 Durand (ref26) ref29 ref8 ref7 ref9 ref4 ref3 Chakrabarti (ref11) ref6 ref5 Gibbons (ref27) 2007 |
| References_xml | – ident: ref23 doi: 10.1561/9781933019604 – ident: ref25 doi: 10.1111/j.2517-6161.1966.tb00626.x – ident: ref18 doi: 10.1145/2213556.2213597 – ident: ref22 doi: 10.1007/3-540-45749-6_33 – ident: ref29 doi: 10.1145/2031792.2031795 – ident: ref1 doi: 10.1145/1541880.1541882 – ident: ref16 doi: 10.1145/378580.378687 – ident: ref20 doi: 10.1109/EDCC.2012.9 – volume: 22 start-page: 79 issue: 1 volume-title: Annals of Math. Statistics year: 1951 ident: ref24 article-title: On Information and Sufficiency – volume-title: The Internet Traffic Archive, Lawrence Berkeley Nat’l Laboratory year: 2008 ident: ref30 – ident: ref6 doi: 10.1007/3-540-45726-7_1 – ident: ref21 doi: 10.1002/047174882x – ident: ref5 doi: 10.1007/978-3-642-17653-1_5 – ident: ref7 doi: 10.1016/0022-0000(85)90041-8 – ident: ref15 doi: 10.1007/978-3-642-02927-1_10 – ident: ref10 doi: 10.1016/S0304-3975(03)00400-6 – ident: ref3 doi: 10.1145/1080173.1080176 – ident: ref12 doi: 10.1145/1140277.1140295 – ident: ref13 doi: 10.1007/s10994-008-5054-x – ident: ref9 doi: 10.1145/237814.237823 – ident: ref17 doi: 10.1145/2213556.2213596 – start-page: 328 volume-title: Proc. ACM-SIAM Symp. Discrete Algorithms ident: ref11 article-title: A Near-Optimal Algorithm for Computing the Entropy of a Stream – ident: ref14 doi: 10.1145/1921659.1921667 – volume-title: Data Streams Management: Processing High-Speed Data Streams year: 2007 ident: ref27 – ident: ref8 doi: 10.1145/1807085.1807094 – ident: ref19 doi: 10.1109/NCA.2012.16 – volume-title: Proc. 11th European Symp. Algorithms (ESA) ident: ref26 article-title: Log-Log Counting of Large Cardinalities – start-page: 234 volume-title: Proc. ACM SIGCOMM ident: ref2 article-title: Sketch-Based Change Detection: Methods, Evaluation, and Applications – ident: ref4 doi: 10.1145/1090191.1080118 – ident: ref28 doi: 10.1016/0167-6423(82)90012-0 |
| SSID | ssj0014504 |
| Score | 2.165001 |
| Snippet | In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the... |
| SourceID | hal crossref ieee |
| SourceType | Open Access Repository Enrichment Source Index Database Publisher |
| StartPage | 478 |
| SubjectTerms | Algorithm design and analysis Approximation algorithms byzantine adversary Computational modeling Computer Science Data models Data stream Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Entropy Estimation Kullback-Leibler divergence performance analysis Radiation detectors randomized approximation algorithm |
| Title | A Distributed Information Divergence Estimation over Data Streams |
| URI | https://ieeexplore.ieee.org/document/6494567 https://hal.science/hal-00998708 |
| Volume | 25 |
| WOSCitedRecordID | wos000329051500020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) issn: 1045-9219 databaseCode: RIE dateStart: 19900101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://ieeexplore.ieee.org/ omitProxy: false ssIdentifier: ssj0014504 providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEB3a4kEPVlvF-sUi4qnRpNlNdo_BtvRQSsEqvYXNZhcFbaVfv9-dJI0V9CD0ECZDaHay2Znsm_cAbpWiQUe50kEmFocmJnCEYRrptQVLkpTpDPL_MgxHIz6dinEF2mUvjNY6A5_pezzM9vLTuVrjp7KHgAq73odVqIZhkPdqlTsGlGVSgba6YI6w0_CbT_NhMu4-IYjLx1L1x_pTfUX0446sSraq9Ov_-z9HcFhkjyTKw30MFT1rQH2rzECKidqAgx2awSZEEekiPy5KW-mUFB1IGBFr3-Ttl5r07GQvrAjrJF25kgQ3reXH8gSe-73J48AppBMc5fsuCswLIyU1jEpPGyGZsomGoNyV9qeoFG6quD1hlAk8zn2vozlNVOIp5XuSav8UarP5TJ8BSYJQSW78VNnSKRFY8jDGZGBTM3vv3LSgvR3QWBW84ihv8R5n9YUrYhz_GMcf8WQtuCvdP3NCjb8cb2x0Sh-kwR5EwxhtmNba9wzfWKcmhqX0KiJy_rv5Avbt1WkOur6E2mqx1lewpzart-XiOnuevgBOKMfo |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED62KagPTqfi_FlEfFpduyZd8ljcxsQ5Bk7ZW0nTBAXdZL_-fnNtVyfog7CHcj3ommuau-a77wO4lpL4DekIG5lYbBJp3-aaKqTX5jSKYqoSyP9Lr9nvs9GIDwpQy3thlFIJ-Ezd4mGylx9P5AI_ldV9ws163yzCBipnZd1a-Z4BoYlYoKkvqM3NRPxm1KwPB60nhHF5WKz-WIGKr4h_XBNWSdaVTvl__2gPdrP80QrSgO9DQY0rUF5pM1jZVK3AzhrR4AEEgdVChlwUt1KxlfUgYUyMfZk2YCqrbaZ7ZkVgp9USc2HhtrX4mB3Cc6c9vOvamXiCLT3PQYl5roUgmhLhKs0FlSbV4IQ5wvwkEdyJJTMntNS-y5jnNhQjkYxcKT1XEOUdQWk8GatjsCK_KQXTXixN8RRxLHoopcI3yZm5d6arUFsNaCgzZnEUuHgPkwrD4SGOf4jjj4iyKtzk7p8ppcZfjlcmOrkPEmF3g16INkxszZuGLY3TAYYl98oicvK7-RK2usPHXti77z-cwra5Ekkh2GdQmk8X6hw25XL-NpteJM_WF-QNyzE |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Distributed+Information+Divergence+Estimation+over+Data+Streams&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Anceaume%2C+Emmanuelle&rft.au=Busnel%2C+Yann&rft.date=2014-02-01&rft.pub=IEEE&rft.issn=1045-9219&rft.volume=25&rft.issue=2&rft.spage=478&rft.epage=487&rft_id=info:doi/10.1109%2FTPDS.2013.101&rft.externalDocID=6494567 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |