A Distributed Information Divergence Estimation over Data Streams

In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the a...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems Vol. 25; no. 2; pp. 478 - 487
Main Authors: Anceaume, Emmanuelle, Busnel, Yann
Format: Journal Article
Language:English
Published: IEEE 01.02.2014
Institute of Electrical and Electronics Engineers
Subjects:
ISSN:1045-9219
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we propose a novel algorithm AnKLe for estimating the Kullback-Leibler divergence of an observed stream compared with the expected one. AnKLe combines sampling techniques and information-theoretic methods. It is very efficient, both in terms of space and time complexities, and requires only a single pass over the data stream. We show that AnKLe is an (ε, δ)-approximation algorithm with a space complexity Õ(1/ε + 1/ε 2 ) bits in "most" cases, and Õ(1/ε + (n-ε -1 )/ε 2 ) otherwise, where n is the number of distinct data items in a stream. Moreover, we propose a distributed version of AnKLe that requires at most O (rℓ (log n + 1)) bits of communication between the ℓ participating nodes, where r is number of rounds of the algorithm. Experimental results show that the estimation provided by AnKLe remains accurate even for different adversarial settings for which the quality of other methods dramatically decreases.
AbstractList In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we propose a novel algorithm AnKLe for estimating the Kullback-Leibler divergence of an observed stream compared with the expected one. AnKLe combines sampling techniques and information-theoretic methods. It is very efficient, both in terms of space and time complexities, and requires only a single pass over the data stream. We show that AnKLe is an (ε, δ)-approximation algorithm with a space complexity Õ(1/ε + 1/ε^2) bits in "most" cases, and Õ(1/ε + (n−ε−1)/ε^2) otherwise, where n is the number of distinct data items in a stream. Moreover, we propose a distributed version of AnKLe that requires at most O (rl (log n + 1)) bits of communication between the l participating nodes, where r is number of rounds of the algorithm. Experimental results show that the estimation provided by AnKLe remains accurate even for different adversarial settings for which the quality of other methods dramatically decreases.
In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we propose a novel algorithm AnKLe for estimating the Kullback-Leibler divergence of an observed stream compared with the expected one. AnKLe combines sampling techniques and information-theoretic methods. It is very efficient, both in terms of space and time complexities, and requires only a single pass over the data stream. We show that AnKLe is an (ε, δ)-approximation algorithm with a space complexity Õ(1/ε + 1/ε 2 ) bits in "most" cases, and Õ(1/ε + (n-ε -1 )/ε 2 ) otherwise, where n is the number of distinct data items in a stream. Moreover, we propose a distributed version of AnKLe that requires at most O (rℓ (log n + 1)) bits of communication between the ℓ participating nodes, where r is number of rounds of the algorithm. Experimental results show that the estimation provided by AnKLe remains accurate even for different adversarial settings for which the quality of other methods dramatically decreases.
Author Busnel, Yann
Anceaume, Emmanuelle
Author_xml – sequence: 1
  givenname: Emmanuelle
  surname: Anceaume
  fullname: Anceaume, Emmanuelle
  email: Emmanuelle.Anceaume@irisa.fr
  organization: IRISA, Rennes, France
– sequence: 2
  givenname: Yann
  surname: Busnel
  fullname: Busnel, Yann
  email: Yann.Busnel@univ-nantes.fr
  organization: Dept. of Comput. Sci., Univ. de Nantes, Nantes, France
BackLink https://hal.science/hal-00998708$$DView record in HAL
BookMark eNp1kEFPAjEQRnvARFCPnrzs1cPiDO1CeySAQkKiCXhuhtJqDeyatpL47-0KejAxaTLJ65t25uuxTt3UlrFrhD4iqLv103TVHwDyPgJ2WBdBVKUaoDpnvRjfAFBUILpsPC6mPqbgNx_JbotF7Zqwp-SbOvODDS-2NraYxeRPtMmwmFKiYpWCpX28ZGeOdtFeneoFe76frSfzcvn4sJiMl6XhHFKJXDki4SpBaJ2iylRSKCGB8jGCFGyNzBfOuCFKyXFgpdiYDRrDkYTlF-z2-O4r7fR7yPOET92Q1_PxUrcMQCk5AnnA7PKja0ITY7BOG5--50-B_E4j6DYl3aak25QyabvKP10_3_zn3xx9b639dYd5q2o44l84a3Wd
CODEN ITDSEO
CitedBy_id crossref_primary_10_1002_widm_1247
crossref_primary_10_1016_j_jss_2017_03_057
crossref_primary_10_1016_j_eswa_2015_07_027
Cites_doi 10.1561/9781933019604
10.1111/j.2517-6161.1966.tb00626.x
10.1145/2213556.2213597
10.1007/3-540-45749-6_33
10.1145/2031792.2031795
10.1145/1541880.1541882
10.1145/378580.378687
10.1109/EDCC.2012.9
10.1007/3-540-45726-7_1
10.1002/047174882x
10.1007/978-3-642-17653-1_5
10.1016/0022-0000(85)90041-8
10.1007/978-3-642-02927-1_10
10.1016/S0304-3975(03)00400-6
10.1145/1080173.1080176
10.1145/1140277.1140295
10.1007/s10994-008-5054-x
10.1145/237814.237823
10.1145/2213556.2213596
10.1145/1921659.1921667
10.1145/1807085.1807094
10.1109/NCA.2012.16
10.1145/1090191.1080118
10.1016/0167-6423(82)90012-0
ContentType Journal Article
Copyright Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID 97E
RIA
RIE
AAYXX
CITATION
1XC
VOOES
DOI 10.1109/TPDS.2013.101
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library (IEL)
CrossRef
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle CrossRef
DatabaseTitleList

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EndPage 487
ExternalDocumentID oai:HAL:hal-00998708v1
10_1109_TPDS_2013_101
6494567
Genre orig-research
GroupedDBID --Z
-~X
.DC
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RZB
TN5
TWZ
UHB
VH1
AAYXX
CITATION
1XC
VOOES
ID FETCH-LOGICAL-c330t-139faa4f54a1ef9a5c5849480a80ac4a90dc8ef9fcf6188312e84bcb1cc31a4e3
IEDL.DBID RIE
ISICitedReferencesCount 10
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000329051500020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1045-9219
IngestDate Sat Nov 29 15:07:27 EST 2025
Tue Nov 18 22:17:44 EST 2025
Sat Nov 29 08:09:26 EST 2025
Wed Aug 27 02:52:20 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords Randomized approximation algorithm
Data stream
Divergence
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c330t-139faa4f54a1ef9a5c5849480a80ac4a90dc8ef9fcf6188312e84bcb1cc31a4e3
ORCID 0000-0003-4158-149X
0000-0001-6908-719X
OpenAccessLink https://hal.science/hal-00998708
PageCount 10
ParticipantIDs ieee_primary_6494567
crossref_citationtrail_10_1109_TPDS_2013_101
crossref_primary_10_1109_TPDS_2013_101
hal_primary_oai_HAL_hal_00998708v1
PublicationCentury 2000
PublicationDate 2014-Feb.
2014-2-00
2014-02
PublicationDateYYYYMMDD 2014-02-01
PublicationDate_xml – month: 02
  year: 2014
  text: 2014-Feb.
PublicationDecade 2010
PublicationTitle IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev TPDS
PublicationYear 2014
Publisher IEEE
Institute of Electrical and Electronics Engineers
Publisher_xml – name: IEEE
– name: Institute of Electrical and Electronics Engineers
References ref13
ref12
ref15
ref14
ref10
(ref30) 2008
ref1
ref17
ref16
ref19
ref18
ref23
ref25
ref20
Subhabrata (ref2)
ref22
ref21
Kullback (ref24) 1951; 22
ref28
Durand (ref26)
ref29
ref8
ref7
ref9
ref4
ref3
Chakrabarti (ref11)
ref6
ref5
Gibbons (ref27) 2007
References_xml – ident: ref23
  doi: 10.1561/9781933019604
– ident: ref25
  doi: 10.1111/j.2517-6161.1966.tb00626.x
– ident: ref18
  doi: 10.1145/2213556.2213597
– ident: ref22
  doi: 10.1007/3-540-45749-6_33
– ident: ref29
  doi: 10.1145/2031792.2031795
– ident: ref1
  doi: 10.1145/1541880.1541882
– ident: ref16
  doi: 10.1145/378580.378687
– ident: ref20
  doi: 10.1109/EDCC.2012.9
– volume: 22
  start-page: 79
  issue: 1
  volume-title: Annals of Math. Statistics
  year: 1951
  ident: ref24
  article-title: On Information and Sufficiency
– volume-title: The Internet Traffic Archive, Lawrence Berkeley Nat’l Laboratory
  year: 2008
  ident: ref30
– ident: ref6
  doi: 10.1007/3-540-45726-7_1
– ident: ref21
  doi: 10.1002/047174882x
– ident: ref5
  doi: 10.1007/978-3-642-17653-1_5
– ident: ref7
  doi: 10.1016/0022-0000(85)90041-8
– ident: ref15
  doi: 10.1007/978-3-642-02927-1_10
– ident: ref10
  doi: 10.1016/S0304-3975(03)00400-6
– ident: ref3
  doi: 10.1145/1080173.1080176
– ident: ref12
  doi: 10.1145/1140277.1140295
– ident: ref13
  doi: 10.1007/s10994-008-5054-x
– ident: ref9
  doi: 10.1145/237814.237823
– ident: ref17
  doi: 10.1145/2213556.2213596
– start-page: 328
  volume-title: Proc. ACM-SIAM Symp. Discrete Algorithms
  ident: ref11
  article-title: A Near-Optimal Algorithm for Computing the Entropy of a Stream
– ident: ref14
  doi: 10.1145/1921659.1921667
– volume-title: Data Streams Management: Processing High-Speed Data Streams
  year: 2007
  ident: ref27
– ident: ref8
  doi: 10.1145/1807085.1807094
– ident: ref19
  doi: 10.1109/NCA.2012.16
– volume-title: Proc. 11th European Symp. Algorithms (ESA)
  ident: ref26
  article-title: Log-Log Counting of Large Cardinalities
– start-page: 234
  volume-title: Proc. ACM SIGCOMM
  ident: ref2
  article-title: Sketch-Based Change Detection: Methods, Evaluation, and Applications
– ident: ref4
  doi: 10.1145/1090191.1080118
– ident: ref28
  doi: 10.1016/0167-6423(82)90012-0
SSID ssj0014504
Score 2.165001
Snippet In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the...
SourceID hal
crossref
ieee
SourceType Open Access Repository
Enrichment Source
Index Database
Publisher
StartPage 478
SubjectTerms Algorithm design and analysis
Approximation algorithms
byzantine adversary
Computational modeling
Computer Science
Data models
Data stream
Data Structures and Algorithms
Distributed, Parallel, and Cluster Computing
Entropy
Estimation
Kullback-Leibler divergence
performance analysis
Radiation detectors
randomized approximation algorithm
Title A Distributed Information Divergence Estimation over Data Streams
URI https://ieeexplore.ieee.org/document/6494567
https://hal.science/hal-00998708
Volume 25
WOSCitedRecordID wos000329051500020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  issn: 1045-9219
  databaseCode: RIE
  dateStart: 19900101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://ieeexplore.ieee.org/
  omitProxy: false
  ssIdentifier: ssj0014504
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEB3a4kEPVlvF-sUi4qnRpNlNdo_BtvRQSsEqvYXNZhcFbaVfv9-dJI0V9CD0ECZDaHay2Znsm_cAbpWiQUe50kEmFocmJnCEYRrptQVLkpTpDPL_MgxHIz6dinEF2mUvjNY6A5_pezzM9vLTuVrjp7KHgAq73odVqIZhkPdqlTsGlGVSgba6YI6w0_CbT_NhMu4-IYjLx1L1x_pTfUX0446sSraq9Ov_-z9HcFhkjyTKw30MFT1rQH2rzECKidqAgx2awSZEEekiPy5KW-mUFB1IGBFr3-Ttl5r07GQvrAjrJF25kgQ3reXH8gSe-73J48AppBMc5fsuCswLIyU1jEpPGyGZsomGoNyV9qeoFG6quD1hlAk8zn2vozlNVOIp5XuSav8UarP5TJ8BSYJQSW78VNnSKRFY8jDGZGBTM3vv3LSgvR3QWBW84ihv8R5n9YUrYhz_GMcf8WQtuCvdP3NCjb8cb2x0Sh-kwR5EwxhtmNba9wzfWKcmhqX0KiJy_rv5Avbt1WkOur6E2mqx1lewpzart-XiOnuevgBOKMfo
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED62KagPTqfi_FlEfFpduyZd8ljcxsQ5Bk7ZW0nTBAXdZL_-fnNtVyfog7CHcj3ommuau-a77wO4lpL4DekIG5lYbBJp3-aaKqTX5jSKYqoSyP9Lr9nvs9GIDwpQy3thlFIJ-Ezd4mGylx9P5AI_ldV9ws163yzCBipnZd1a-Z4BoYlYoKkvqM3NRPxm1KwPB60nhHF5WKz-WIGKr4h_XBNWSdaVTvl__2gPdrP80QrSgO9DQY0rUF5pM1jZVK3AzhrR4AEEgdVChlwUt1KxlfUgYUyMfZk2YCqrbaZ7ZkVgp9USc2HhtrX4mB3Cc6c9vOvamXiCLT3PQYl5roUgmhLhKs0FlSbV4IQ5wvwkEdyJJTMntNS-y5jnNhQjkYxcKT1XEOUdQWk8GatjsCK_KQXTXixN8RRxLHoopcI3yZm5d6arUFsNaCgzZnEUuHgPkwrD4SGOf4jjj4iyKtzk7p8ppcZfjlcmOrkPEmF3g16INkxszZuGLY3TAYYl98oicvK7-RK2usPHXti77z-cwra5Ekkh2GdQmk8X6hw25XL-NpteJM_WF-QNyzE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Distributed+Information+Divergence+Estimation+over+Data+Streams&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Anceaume%2C+Emmanuelle&rft.au=Busnel%2C+Yann&rft.date=2014-02-01&rft.pub=IEEE&rft.issn=1045-9219&rft.volume=25&rft.issue=2&rft.spage=478&rft.epage=487&rft_id=info:doi/10.1109%2FTPDS.2013.101&rft.externalDocID=6494567
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon