Towards Provenance-Based Anomaly Detection in MapReduce

MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little co...

Full description

Saved in:
Bibliographic Details
Published in:2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing pp. 647 - 656
Main Authors: Cong Liao, Squicciarini, Anna
Format: Conference Proceeding
Language:English
Published: IEEE 01.05.2015
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.
AbstractList MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.
Author Cong Liao
Squicciarini, Anna
Author_xml – sequence: 1
  surname: Cong Liao
  fullname: Cong Liao
  email: cxl491@psu.edu
  organization: Coll. of Inf. Sci. & Technol., Pennsylvania State Univ., University Park, PA, USA
– sequence: 2
  givenname: Anna
  surname: Squicciarini
  fullname: Squicciarini, Anna
  email: acs20@psu.edu
  organization: Coll. of Inf. Sci. & Technol., Pennsylvania State Univ., University Park, PA, USA
BookMark eNotjk1Lw0AUAFdQUGuuXrzkDyS-tx_Z7LFGW4WWitRzedl9gYV2U5Ko9N9b0NOcZphbcZn6xELcI5SI4B6bZjnEUEpAU2J1ITJna9TWuRqgktciG8fYggarlYTqRtht_0NDGPP3of_mRMlz8UQjh3ye-gPtT_kzT-yn2Kc8pnxNxw8OX57vxFVH-5Gzf87E5-Jl27wWq83yrZmvCpLaTAUG5wxKJ33o2vOArhxwwDZYYKUNBa49ddaqYInPSuW96mTrLLbGE3ZqJh7-upGZd8chHmg47SwaaRSoXygwRv0
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CCGrid.2015.16
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781479980062
1479980064
EndPage 656
ExternalDocumentID 7152530
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
GUFHI
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a245t-1d9951292cdfb0624690ed1bd70e345ade8caf773d7aea246cc3f2b971b5ca1f3
IEDL.DBID RIE
ISICitedReferencesCount 12
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000380493100065&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:46:56 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a245t-1d9951292cdfb0624690ed1bd70e345ade8caf773d7aea246cc3f2b971b5ca1f3
PageCount 10
ParticipantIDs ieee_primary_7152530
PublicationCentury 2000
PublicationDate 2015-May
PublicationDateYYYYMMDD 2015-05-01
PublicationDate_xml – month: 05
  year: 2015
  text: 2015-May
PublicationDecade 2010
PublicationTitle 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
PublicationTitleAbbrev CCGrid
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib040743206
ssib026764721
Score 1.6618801
Snippet MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats...
SourceID ieee
SourceType Publisher
StartPage 647
SubjectTerms Access control
Cloud computing
computation integrity
Containers
Distributed databases
logging
MapReduce
Monitoring
provenance
Yarn
Title Towards Provenance-Based Anomaly Detection in MapReduce
URI https://ieeexplore.ieee.org/document/7152530
WOSCitedRecordID wos000380493100065&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61ePCk0opv9uDRtJv35mq1erEUqdBbyeYBBd2Wdiv4781ka_XgxVtICCSTx8ckM9-H0A3nwpmcBcyYkJgXpcbGWIodc4oZF6yhLolNqNGomE71uIVud7kw3vsUfOZ7UEx_-W5hN_BU1lcg1sOig76nlGpytb73DpUKiNB3e5UDNNJcbnkaSa77g8Hjag7koET0QN78l5pKApPh4f-GcYS6P1l52XiHN8eo5asOUpMU97qGlnhvwRriu4hMLouO_bt5-8zufZ3CrapsXmXPZvkCZK2-i16HD5PBE96KIWBDuagxcVoDOFPrQplLCm6td6R0KveMC-N8YU1QKtrY-NhFWssCLbUipbCGBHaC2tWi8qcoy40gPh7kQgbClTSFCgw8EatDtGAgZ6gDk54tG76L2Xa-539XX6ADMGkTBHiJ2vVq46_Qvv2o5-vVdVqkL_9Bki4
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA6lCnpSacW3e_Bo2s1792q1VmxLkQq9lWweUNBt2W4F_72Zba0evHgLCYFk8viYZOb7ELrhXFgdM48ZExLzJEux1oZiy6xi2nqjqa3EJtRwmEwm6aiGbre5MM65KvjMtaBY_eXbuVnBU1lbgVgPCw76juCcknW21vfuoVIBFfp2t3IARxrLDVMjidN2p_NYzIAelIgWCJz_0lOp4KR78L-BHKLmT15eNNoizhGqubyB1LiKfF1CS7i5YBXxXcAmGwXX_l2_fUb3rqwCrvJolkcDvXgBulbXRK_dh3GnhzdyCFhTLkpMbJoCPFNjfRZLCo6tsySzKnaMC21dYrRXKlhZu9BFGsM8zVJFMmE08ewY1fN57k5QFGtBXDjKifSEK6kT5Rn4Iib1wYKenKIGTHq6WDNeTDfzPfu7-hrt9caD_rT_NHw-R_tg3nVI4AWql8XKXaJd81HOlsVVtWBfzfqVdQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+15th+IEEE%2FACM+International+Symposium+on+Cluster%2C+Cloud+and+Grid+Computing&rft.atitle=Towards+Provenance-Based+Anomaly+Detection+in+MapReduce&rft.au=Cong+Liao&rft.au=Squicciarini%2C+Anna&rft.date=2015-05-01&rft.pub=IEEE&rft.spage=647&rft.epage=656&rft_id=info:doi/10.1109%2FCCGrid.2015.16&rft.externalDocID=7152530