Ensemble unsupervised autoencoders and Gaussian mixture model for cyberattack detection

Previous studies have adopted unsupervised machine learning with dimension reduction functions for cyberattack detection, which are limited to performing robust anomaly detection with high-dimensional and sparse data. Most of them usually assume homogeneous parameters with a specific Gaussian distri...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information processing & management Jg. 59; H. 2; S. 102844
Hauptverfasser:	An, Peng, Wang, Zhiyuan, Zhang, Chunjiong
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Oxford Elsevier Ltd 01.03.2022 Elsevier Science Ltd
Schlagworte:	Algorithms Anomalies Cyberattacks Cybercrime Data Deep autoencoder Density Domains Experiments GMM Machine learning Multidomain data Normal distribution Optimization Outliers (statistics) Probabilistic models Robustness Skewness Multidomain data Cyberattacks GMM Deep autoencoder
ISSN:	0306-4573, 1873-5371
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Previous studies have adopted unsupervised machine learning with dimension reduction functions for cyberattack detection, which are limited to performing robust anomaly detection with high-dimensional and sparse data. Most of them usually assume homogeneous parameters with a specific Gaussian distribution for each domain, ignoring the robust testing of data skewness. This paper proposes to use unsupervised ensemble autoencoders connected to the Gaussian mixture model (GMM) to adapt to multiple domains regardless of the skewness of each domain. In the hidden space of the ensemble autoencoder, the attention-based latent representation and reconstructed features of the minimum error are utilized. The expectation maximization (EM) algorithm is used to estimate the sample density in the GMM. When the estimated sample density exceeds the learning threshold obtained in the training phase, the sample is identified as an outlier related to an attack anomaly. Finally, the ensemble autoencoder and the GMM are jointly optimized, which transforms the optimization of objective function into a Lagrangian dual problem. Experiments conducted on three public data sets validate that the performance of the proposed model is significantly competitive with the selected anomaly detection baselines. •An ensemble framework of multichannel network anomaly detection model that combines deep autoencoders and the GMM.•A robust optimization version of EM3 for multiple domains, which transforms the optimization problem of the objective function into a Lagrangian dual.•We deduce the formula and analyze the convergence of the full text, and prove that our model has stability and robustness.•To the best of our knowledge is the first work that performs algorithms on both differentiated data domains and data distributions.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0306-4573 1873-5371
DOI:	10.1016/j.ipm.2021.102844