An Autocorrelation-based LSTM-Autoencoder for Anomaly Detection on Time-Series Data
Data quality significantly impacts the results of data analytics. Researchers have proposed machine learning based anomaly detection techniques to identify incorrect data. Existing approaches fail to (1) identify the underlying domain constraints violated by the anomalous data, and (2) generate expl...
Uloženo v:
| Vydáno v: | 2020 IEEE International Conference on Big Data (Big Data) s. 5068 - 5077 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
10.12.2020
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Data quality significantly impacts the results of data analytics. Researchers have proposed machine learning based anomaly detection techniques to identify incorrect data. Existing approaches fail to (1) identify the underlying domain constraints violated by the anomalous data, and (2) generate explanations of these violations in a form comprehensible to domain experts. We propose IDEAL, which is an LSTM-Autoencoder based approach that detects anomalies in multivariate time-series data, generates domain constraints, and reports subsequences that violate the constraints as anomalies. We propose an automated autocorrelation-based windowing approach to adjust the network input size, thereby improving the correctness and performance of constraint discovery over manual and brute-force approaches. The anomalies are visualized in a manner comprehensible to domain experts in the form of decision trees extracted from a random forest classifier. Domain experts can then provide feedback to retrain the learning model and improve the accuracy of the process. We evaluate the effectiveness of IDEAL using datasets from Yahoo servers, NASA Shuttle, and Colorado State University Energy Institute. We demonstrate that IDEAL can detect previously known anomalies from these datasets. Using mutation analysis, we show that IDEAL can detect different types of injected faults. We also demonstrate that the accuracy improves after incorporating domain expert feedback. |
|---|---|
| AbstractList | Data quality significantly impacts the results of data analytics. Researchers have proposed machine learning based anomaly detection techniques to identify incorrect data. Existing approaches fail to (1) identify the underlying domain constraints violated by the anomalous data, and (2) generate explanations of these violations in a form comprehensible to domain experts. We propose IDEAL, which is an LSTM-Autoencoder based approach that detects anomalies in multivariate time-series data, generates domain constraints, and reports subsequences that violate the constraints as anomalies. We propose an automated autocorrelation-based windowing approach to adjust the network input size, thereby improving the correctness and performance of constraint discovery over manual and brute-force approaches. The anomalies are visualized in a manner comprehensible to domain experts in the form of decision trees extracted from a random forest classifier. Domain experts can then provide feedback to retrain the learning model and improve the accuracy of the process. We evaluate the effectiveness of IDEAL using datasets from Yahoo servers, NASA Shuttle, and Colorado State University Energy Institute. We demonstrate that IDEAL can detect previously known anomalies from these datasets. Using mutation analysis, we show that IDEAL can detect different types of injected faults. We also demonstrate that the accuracy improves after incorporating domain expert feedback. |
| Author | Ray, Indrakshi Kahn, Michael G. Duggan, Jerry Ghosh, Sudipto Gondalia, Shlok Homayouni, Hajar |
| Author_xml | – sequence: 1 givenname: Hajar surname: Homayouni fullname: Homayouni, Hajar email: hhajar@colostate.edu organization: Colorado State University,Department of Computer Science – sequence: 2 givenname: Sudipto surname: Ghosh fullname: Ghosh, Sudipto email: ghosh@colostate.edu organization: Colorado State University,Department of Computer Science – sequence: 3 givenname: Indrakshi surname: Ray fullname: Ray, Indrakshi email: iray@colostate.edu organization: Colorado State University,Department of Computer Science – sequence: 4 givenname: Shlok surname: Gondalia fullname: Gondalia, Shlok email: shlok@rams.colostate.edu organization: Colorado State University,Department of Computer Science – sequence: 5 givenname: Jerry surname: Duggan fullname: Duggan, Jerry email: jerry.duggan@colostate.edu organization: Energy Institute Colorado State University – sequence: 6 givenname: Michael G. surname: Kahn fullname: Kahn, Michael G. email: michael.kahn@cuanschutz.edu organization: Anschutz Medical Campus University of Colorado |
| BookMark | eNotT8FOwzAUCxIcYOwLuIQPSElelzQ5lg0GUhGHlvOUpC8oUtugtBz292xikiVbsmXLd-R6ShMS8ih4IQQ3T8_xe2cXKzkHKIADL0xZaWHgiqzNSVSghQIp5C1p64nWv0vyKWcc7BLTxJydsadN232ws4WTTz1mGlKm9ZRGOxzpDhf05zA9oYsjshZzxJmed-_JTbDDjOsLr8jX60u3fWPN5_59WzcsAi8X5rRVXsugwEkpwYPEjdIITjheBWmk8lAFb3q74aLUWBkjtHFB2sDRKFuuyMN_b0TEw0-Oo83Hw-Vq-QdogU8q |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/BigData50022.2020.9378192 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781728162515 1728162513 |
| EndPage | 5077 |
| ExternalDocumentID | 9378192 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i203t-b8a6c85f62b5552c25e468e2b1b07f5956c27fc9da40138e799189bf5af0e96a3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 39 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000662554705019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Thu Jun 29 18:39:22 EDT 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-b8a6c85f62b5552c25e468e2b1b07f5956c27fc9da40138e799189bf5af0e96a3 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_9378192 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-Dec.-10 |
| PublicationDateYYYYMMDD | 2020-12-10 |
| PublicationDate_xml | – month: 12 year: 2020 text: 2020-Dec.-10 day: 10 |
| PublicationDecade | 2020 |
| PublicationTitle | 2020 IEEE International Conference on Big Data (Big Data) |
| PublicationTitleAbbrev | Big Data |
| PublicationYear | 2020 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 2.0385725 |
| Snippet | Data quality significantly impacts the results of data analytics. Researchers have proposed machine learning based anomaly detection techniques to identify... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 5068 |
| SubjectTerms | Anomaly detection Autocorrelation Big Data Data quality tests Decision trees Explainability LSTM-Autoencoder Manuals NASA Random forests Servers Time series |
| Title | An Autocorrelation-based LSTM-Autoencoder for Anomaly Detection on Time-Series Data |
| URI | https://ieeexplore.ieee.org/document/9378192 |
| WOSCitedRecordID | wos000662554705019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1qEfGk0orfRPBo2jTd3WSP1Vo8aCm0Qm8lH7NS0K20W8F_b2a7VAQvQg4hCYRMDjOTvPcG4Aa7WirnQ-QmUPJIRym3wikeuUS7kBSpjrdlsQk1HOrpNB3V4HbLhUHEEnyGLeqWf_l-4db0VNYOrpT0u3ZgR6lkw9Xag-tKNrN9N3_tm4IU_iUxrKRoVet_FU4p_cbg4H87HkLzh4DHRlvXcgQ1zBsw7uWsty4WjgpqbCBsnJyQZ0_jyTOnKVKl9LhkIRJlIbF_N29frI9FCbfKWWjE-OD0IoYrRsdowsvgYXL_yKuaCHwuRbfgVpvE6ThLpI3jWDoZY5RolLZjhcrikO04qTKXekOJk0YV4j-d2iw2mcA0Md1jqOeLHE-AOdKFkS41ygtSVdM6i7zNvNROeKvMKTTIILOPjezFrLLF2d_D57BPNiekR0dcQL1YrvESdt1nMV8tr8q7-ga3hpcW |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA21inpSacVvI3g0bZrubrLHai0V21Johd5KPmaloFtpt4L_3sx2qQhehBxCQgiZHGYmee8NIbfQVEJa5yM3DoIFKoiZ4VaywEbK-qRINpzJi03IwUBNJvGwRO42XBgAyMFnUMNu_pfv5naFT2V170pRv2uLbIdBIPiarbVLbgrhzPr97LWtM9T4F8ixErxWrPhVOiX3HJ2D_-15SKo_FDw63DiXI1KCtEJGrZS2VtncYkmNNYiNoRtytDca9xlOoS6lgwX1sSj1qf27fvuibchywFVKfUPOB8M3MVhSPEaVvHQexw9dVlRFYDPBmxkzSkdWhUkkTBiGwooQgkiBMA3DZRL6fMcKmdjYaUydFEgfAarYJKFOOMSRbh6TcjpP4YRQi8owwsZaOo66akolgTOJE8pyZ6Q-JRU0yPRjLXwxLWxx9vfwNdnrjvu9ae9p8HxO9tH-iPto8AtSzhYruCQ79jObLRdX-b19A8JWml0 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+IEEE+International+Conference+on+Big+Data+%28Big+Data%29&rft.atitle=An+Autocorrelation-based+LSTM-Autoencoder+for+Anomaly+Detection+on+Time-Series+Data&rft.au=Homayouni%2C+Hajar&rft.au=Ghosh%2C+Sudipto&rft.au=Ray%2C+Indrakshi&rft.au=Gondalia%2C+Shlok&rft.date=2020-12-10&rft.pub=IEEE&rft.spage=5068&rft.epage=5077&rft_id=info:doi/10.1109%2FBigData50022.2020.9378192&rft.externalDocID=9378192 |