An Autocorrelation-based LSTM-Autoencoder for Anomaly Detection on Time-Series Data

Data quality significantly impacts the results of data analytics. Researchers have proposed machine learning based anomaly detection techniques to identify incorrect data. Existing approaches fail to (1) identify the underlying domain constraints violated by the anomalous data, and (2) generate expl...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2020 IEEE International Conference on Big Data (Big Data) s. 5068 - 5077
Hlavní autori: Homayouni, Hajar, Ghosh, Sudipto, Ray, Indrakshi, Gondalia, Shlok, Duggan, Jerry, Kahn, Michael G.
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 10.12.2020
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Data quality significantly impacts the results of data analytics. Researchers have proposed machine learning based anomaly detection techniques to identify incorrect data. Existing approaches fail to (1) identify the underlying domain constraints violated by the anomalous data, and (2) generate explanations of these violations in a form comprehensible to domain experts. We propose IDEAL, which is an LSTM-Autoencoder based approach that detects anomalies in multivariate time-series data, generates domain constraints, and reports subsequences that violate the constraints as anomalies. We propose an automated autocorrelation-based windowing approach to adjust the network input size, thereby improving the correctness and performance of constraint discovery over manual and brute-force approaches. The anomalies are visualized in a manner comprehensible to domain experts in the form of decision trees extracted from a random forest classifier. Domain experts can then provide feedback to retrain the learning model and improve the accuracy of the process. We evaluate the effectiveness of IDEAL using datasets from Yahoo servers, NASA Shuttle, and Colorado State University Energy Institute. We demonstrate that IDEAL can detect previously known anomalies from these datasets. Using mutation analysis, we show that IDEAL can detect different types of injected faults. We also demonstrate that the accuracy improves after incorporating domain expert feedback.
DOI:10.1109/BigData50022.2020.9378192