Ensemble unsupervised autoencoders and Gaussian mixture model for cyberattack detection

Previous studies have adopted unsupervised machine learning with dimension reduction functions for cyberattack detection, which are limited to performing robust anomaly detection with high-dimensional and sparse data. Most of them usually assume homogeneous parameters with a specific Gaussian distri...

Full description

Saved in:
Bibliographic Details
Published in:Information processing & management Vol. 59; no. 2; p. 102844
Main Authors: An, Peng, Wang, Zhiyuan, Zhang, Chunjiong
Format: Journal Article
Language:English
Published: Oxford Elsevier Ltd 01.03.2022
Elsevier Science Ltd
Subjects:
ISSN:0306-4573, 1873-5371
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Previous studies have adopted unsupervised machine learning with dimension reduction functions for cyberattack detection, which are limited to performing robust anomaly detection with high-dimensional and sparse data. Most of them usually assume homogeneous parameters with a specific Gaussian distribution for each domain, ignoring the robust testing of data skewness. This paper proposes to use unsupervised ensemble autoencoders connected to the Gaussian mixture model (GMM) to adapt to multiple domains regardless of the skewness of each domain. In the hidden space of the ensemble autoencoder, the attention-based latent representation and reconstructed features of the minimum error are utilized. The expectation maximization (EM) algorithm is used to estimate the sample density in the GMM. When the estimated sample density exceeds the learning threshold obtained in the training phase, the sample is identified as an outlier related to an attack anomaly. Finally, the ensemble autoencoder and the GMM are jointly optimized, which transforms the optimization of objective function into a Lagrangian dual problem. Experiments conducted on three public data sets validate that the performance of the proposed model is significantly competitive with the selected anomaly detection baselines. •An ensemble framework of multichannel network anomaly detection model that combines deep autoencoders and the GMM.•A robust optimization version of EM3 for multiple domains, which transforms the optimization problem of the objective function into a Lagrangian dual.•We deduce the formula and analyze the convergence of the full text, and prove that our model has stability and robustness.•To the best of our knowledge is the first work that performs algorithms on both differentiated data domains and data distributions.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2021.102844