An Autocorrelation-based LSTM-Autoencoder for Anomaly Detection on Time-Series Data

Data quality significantly impacts the results of data analytics. Researchers have proposed machine learning based anomaly detection techniques to identify incorrect data. Existing approaches fail to (1) identify the underlying domain constraints violated by the anomalous data, and (2) generate expl...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2020 IEEE International Conference on Big Data (Big Data) s. 5068 - 5077
Hlavní autoři: Homayouni, Hajar, Ghosh, Sudipto, Ray, Indrakshi, Gondalia, Shlok, Duggan, Jerry, Kahn, Michael G.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 10.12.2020
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Data quality significantly impacts the results of data analytics. Researchers have proposed machine learning based anomaly detection techniques to identify incorrect data. Existing approaches fail to (1) identify the underlying domain constraints violated by the anomalous data, and (2) generate explanations of these violations in a form comprehensible to domain experts. We propose IDEAL, which is an LSTM-Autoencoder based approach that detects anomalies in multivariate time-series data, generates domain constraints, and reports subsequences that violate the constraints as anomalies. We propose an automated autocorrelation-based windowing approach to adjust the network input size, thereby improving the correctness and performance of constraint discovery over manual and brute-force approaches. The anomalies are visualized in a manner comprehensible to domain experts in the form of decision trees extracted from a random forest classifier. Domain experts can then provide feedback to retrain the learning model and improve the accuracy of the process. We evaluate the effectiveness of IDEAL using datasets from Yahoo servers, NASA Shuttle, and Colorado State University Energy Institute. We demonstrate that IDEAL can detect previously known anomalies from these datasets. Using mutation analysis, we show that IDEAL can detect different types of injected faults. We also demonstrate that the accuracy improves after incorporating domain expert feedback.
AbstractList Data quality significantly impacts the results of data analytics. Researchers have proposed machine learning based anomaly detection techniques to identify incorrect data. Existing approaches fail to (1) identify the underlying domain constraints violated by the anomalous data, and (2) generate explanations of these violations in a form comprehensible to domain experts. We propose IDEAL, which is an LSTM-Autoencoder based approach that detects anomalies in multivariate time-series data, generates domain constraints, and reports subsequences that violate the constraints as anomalies. We propose an automated autocorrelation-based windowing approach to adjust the network input size, thereby improving the correctness and performance of constraint discovery over manual and brute-force approaches. The anomalies are visualized in a manner comprehensible to domain experts in the form of decision trees extracted from a random forest classifier. Domain experts can then provide feedback to retrain the learning model and improve the accuracy of the process. We evaluate the effectiveness of IDEAL using datasets from Yahoo servers, NASA Shuttle, and Colorado State University Energy Institute. We demonstrate that IDEAL can detect previously known anomalies from these datasets. Using mutation analysis, we show that IDEAL can detect different types of injected faults. We also demonstrate that the accuracy improves after incorporating domain expert feedback.
Author Ray, Indrakshi
Kahn, Michael G.
Duggan, Jerry
Ghosh, Sudipto
Gondalia, Shlok
Homayouni, Hajar
Author_xml – sequence: 1
  givenname: Hajar
  surname: Homayouni
  fullname: Homayouni, Hajar
  email: hhajar@colostate.edu
  organization: Colorado State University,Department of Computer Science
– sequence: 2
  givenname: Sudipto
  surname: Ghosh
  fullname: Ghosh, Sudipto
  email: ghosh@colostate.edu
  organization: Colorado State University,Department of Computer Science
– sequence: 3
  givenname: Indrakshi
  surname: Ray
  fullname: Ray, Indrakshi
  email: iray@colostate.edu
  organization: Colorado State University,Department of Computer Science
– sequence: 4
  givenname: Shlok
  surname: Gondalia
  fullname: Gondalia, Shlok
  email: shlok@rams.colostate.edu
  organization: Colorado State University,Department of Computer Science
– sequence: 5
  givenname: Jerry
  surname: Duggan
  fullname: Duggan, Jerry
  email: jerry.duggan@colostate.edu
  organization: Energy Institute Colorado State University
– sequence: 6
  givenname: Michael G.
  surname: Kahn
  fullname: Kahn, Michael G.
  email: michael.kahn@cuanschutz.edu
  organization: Anschutz Medical Campus University of Colorado
BookMark eNotT8FOwzAUCxIcYOwLuIQPSElelzQ5lg0GUhGHlvOUpC8oUtugtBz292xikiVbsmXLd-R6ShMS8ih4IQQ3T8_xe2cXKzkHKIADL0xZaWHgiqzNSVSghQIp5C1p64nWv0vyKWcc7BLTxJydsadN232ws4WTTz1mGlKm9ZRGOxzpDhf05zA9oYsjshZzxJmed-_JTbDDjOsLr8jX60u3fWPN5_59WzcsAi8X5rRVXsugwEkpwYPEjdIITjheBWmk8lAFb3q74aLUWBkjtHFB2sDRKFuuyMN_b0TEw0-Oo83Hw-Vq-QdogU8q
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/BigData50022.2020.9378192
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781728162515
1728162513
EndPage 5077
ExternalDocumentID 9378192
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i203t-b8a6c85f62b5552c25e468e2b1b07f5956c27fc9da40138e799189bf5af0e96a3
IEDL.DBID RIE
ISICitedReferencesCount 39
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000662554705019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Thu Jun 29 18:39:22 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-b8a6c85f62b5552c25e468e2b1b07f5956c27fc9da40138e799189bf5af0e96a3
PageCount 10
ParticipantIDs ieee_primary_9378192
PublicationCentury 2000
PublicationDate 2020-Dec.-10
PublicationDateYYYYMMDD 2020-12-10
PublicationDate_xml – month: 12
  year: 2020
  text: 2020-Dec.-10
  day: 10
PublicationDecade 2020
PublicationTitle 2020 IEEE International Conference on Big Data (Big Data)
PublicationTitleAbbrev Big Data
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
Score 2.0385725
Snippet Data quality significantly impacts the results of data analytics. Researchers have proposed machine learning based anomaly detection techniques to identify...
SourceID ieee
SourceType Publisher
StartPage 5068
SubjectTerms Anomaly detection
Autocorrelation
Big Data
Data quality tests
Decision trees
Explainability
LSTM-Autoencoder
Manuals
NASA
Random forests
Servers
Time series
Title An Autocorrelation-based LSTM-Autoencoder for Anomaly Detection on Time-Series Data
URI https://ieeexplore.ieee.org/document/9378192
WOSCitedRecordID wos000662554705019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1qEfGk0orfRPBo2jTd3WSP1Vo8aCm0Qm8lH7NS0K20W8F_b2a7VAQvQg4hCYRMDjOTvPcG4Aa7WirnQ-QmUPJIRym3wikeuUS7kBSpjrdlsQk1HOrpNB3V4HbLhUHEEnyGLeqWf_l-4db0VNYOrpT0u3ZgR6lkw9Xag-tKNrN9N3_tm4IU_iUxrKRoVet_FU4p_cbg4H87HkLzh4DHRlvXcgQ1zBsw7uWsty4WjgpqbCBsnJyQZ0_jyTOnKVKl9LhkIRJlIbF_N29frI9FCbfKWWjE-OD0IoYrRsdowsvgYXL_yKuaCHwuRbfgVpvE6ThLpI3jWDoZY5RolLZjhcrikO04qTKXekOJk0YV4j-d2iw2mcA0Md1jqOeLHE-AOdKFkS41ygtSVdM6i7zNvNROeKvMKTTIILOPjezFrLLF2d_D57BPNiekR0dcQL1YrvESdt1nMV8tr8q7-ga3hpcW
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA21inpSacVvI3g0bZrubrLHai0V21Johd5KPmaloFtpt4L_3sx2qQhehBxCQgiZHGYmee8NIbfQVEJa5yM3DoIFKoiZ4VaywEbK-qRINpzJi03IwUBNJvGwRO42XBgAyMFnUMNu_pfv5naFT2V170pRv2uLbIdBIPiarbVLbgrhzPr97LWtM9T4F8ixErxWrPhVOiX3HJ2D_-15SKo_FDw63DiXI1KCtEJGrZS2VtncYkmNNYiNoRtytDca9xlOoS6lgwX1sSj1qf27fvuibchywFVKfUPOB8M3MVhSPEaVvHQexw9dVlRFYDPBmxkzSkdWhUkkTBiGwooQgkiBMA3DZRL6fMcKmdjYaUydFEgfAarYJKFOOMSRbh6TcjpP4YRQi8owwsZaOo66akolgTOJE8pyZ6Q-JRU0yPRjLXwxLWxx9vfwNdnrjvu9ae9p8HxO9tH-iPto8AtSzhYruCQ79jObLRdX-b19A8JWml0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+IEEE+International+Conference+on+Big+Data+%28Big+Data%29&rft.atitle=An+Autocorrelation-based+LSTM-Autoencoder+for+Anomaly+Detection+on+Time-Series+Data&rft.au=Homayouni%2C+Hajar&rft.au=Ghosh%2C+Sudipto&rft.au=Ray%2C+Indrakshi&rft.au=Gondalia%2C+Shlok&rft.date=2020-12-10&rft.pub=IEEE&rft.spage=5068&rft.epage=5077&rft_id=info:doi/10.1109%2FBigData50022.2020.9378192&rft.externalDocID=9378192