Towards Provenance-Based Anomaly Detection in MapReduce
MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little co...
Saved in:
| Published in: | 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing pp. 647 - 656 |
|---|---|
| Main Authors: | , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.05.2015
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system. |
|---|---|
| AbstractList | MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system. |
| Author | Cong Liao Squicciarini, Anna |
| Author_xml | – sequence: 1 surname: Cong Liao fullname: Cong Liao email: cxl491@psu.edu organization: Coll. of Inf. Sci. & Technol., Pennsylvania State Univ., University Park, PA, USA – sequence: 2 givenname: Anna surname: Squicciarini fullname: Squicciarini, Anna email: acs20@psu.edu organization: Coll. of Inf. Sci. & Technol., Pennsylvania State Univ., University Park, PA, USA |
| BookMark | eNotjk1Lw0AUAFdQUGuuXrzkDyS-tx_Z7LFGW4WWitRzedl9gYV2U5Ko9N9b0NOcZphbcZn6xELcI5SI4B6bZjnEUEpAU2J1ITJna9TWuRqgktciG8fYggarlYTqRtht_0NDGPP3of_mRMlz8UQjh3ye-gPtT_kzT-yn2Kc8pnxNxw8OX57vxFVH-5Gzf87E5-Jl27wWq83yrZmvCpLaTAUG5wxKJ33o2vOArhxwwDZYYKUNBa49ddaqYInPSuW96mTrLLbGE3ZqJh7-upGZd8chHmg47SwaaRSoXygwRv0 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/CCGrid.2015.16 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781479980062 1479980064 |
| EndPage | 656 |
| ExternalDocumentID | 7152530 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK GUFHI LHSKQ RIE RIL |
| ID | FETCH-LOGICAL-a245t-1d9951292cdfb0624690ed1bd70e345ade8caf773d7aea246cc3f2b971b5ca1f3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 12 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000380493100065&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:46:56 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a245t-1d9951292cdfb0624690ed1bd70e345ade8caf773d7aea246cc3f2b971b5ca1f3 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_7152530 |
| PublicationCentury | 2000 |
| PublicationDate | 2015-May |
| PublicationDateYYYYMMDD | 2015-05-01 |
| PublicationDate_xml | – month: 05 year: 2015 text: 2015-May |
| PublicationDecade | 2010 |
| PublicationTitle | 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing |
| PublicationTitleAbbrev | CCGrid |
| PublicationYear | 2015 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib040743206 ssib026764721 |
| Score | 1.6618801 |
| Snippet | MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 647 |
| SubjectTerms | Access control Cloud computing computation integrity Containers Distributed databases logging MapReduce Monitoring provenance Yarn |
| Title | Towards Provenance-Based Anomaly Detection in MapReduce |
| URI | https://ieeexplore.ieee.org/document/7152530 |
| WOSCitedRecordID | wos000380493100065&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61ePCk0opv9uDRtJv35mq1erEUqdBbyeYBBd2Wdiv4781ka_XgxVtICCSTx8ckM9-H0A3nwpmcBcyYkJgXpcbGWIodc4oZF6yhLolNqNGomE71uIVud7kw3vsUfOZ7UEx_-W5hN_BU1lcg1sOig76nlGpytb73DpUKiNB3e5UDNNJcbnkaSa77g8Hjag7koET0QN78l5pKApPh4f-GcYS6P1l52XiHN8eo5asOUpMU97qGlnhvwRriu4hMLouO_bt5-8zufZ3CrapsXmXPZvkCZK2-i16HD5PBE96KIWBDuagxcVoDOFPrQplLCm6td6R0KveMC-N8YU1QKtrY-NhFWssCLbUipbCGBHaC2tWi8qcoy40gPh7kQgbClTSFCgw8EatDtGAgZ6gDk54tG76L2Xa-539XX6ADMGkTBHiJ2vVq46_Qvv2o5-vVdVqkL_9Bki4 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA6lCnpSacW3e_Bo2s1792q1VmxLkQq9lWweUNBt2W4F_72Zba0evHgLCYFk8viYZOb7ELrhXFgdM48ZExLzJEux1oZiy6xi2nqjqa3EJtRwmEwm6aiGbre5MM65KvjMtaBY_eXbuVnBU1lbgVgPCw76juCcknW21vfuoVIBFfp2t3IARxrLDVMjidN2p_NYzIAelIgWCJz_0lOp4KR78L-BHKLmT15eNNoizhGqubyB1LiKfF1CS7i5YBXxXcAmGwXX_l2_fUb3rqwCrvJolkcDvXgBulbXRK_dh3GnhzdyCFhTLkpMbJoCPFNjfRZLCo6tsySzKnaMC21dYrRXKlhZu9BFGsM8zVJFMmE08ewY1fN57k5QFGtBXDjKifSEK6kT5Rn4Iib1wYKenKIGTHq6WDNeTDfzPfu7-hrt9caD_rT_NHw-R_tg3nVI4AWql8XKXaJd81HOlsVVtWBfzfqVdQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+15th+IEEE%2FACM+International+Symposium+on+Cluster%2C+Cloud+and+Grid+Computing&rft.atitle=Towards+Provenance-Based+Anomaly+Detection+in+MapReduce&rft.au=Cong+Liao&rft.au=Squicciarini%2C+Anna&rft.date=2015-05-01&rft.pub=IEEE&rft.spage=647&rft.epage=656&rft_id=info:doi/10.1109%2FCCGrid.2015.16&rft.externalDocID=7152530 |