On IoT intrusion detection based on data augmentation for enhancing learning on unbalanced samples

Internet of things (IoT) security is a prerequisite for the rapid development of the IoT to enhance human well-being. Machine learning-based intrusion detection systems (IDS) have good protection capabilities. However, it is difficult to identify attack information in massive amounts of data, which...

Full description

Saved in:
Bibliographic Details
Published in:Future generation computer systems Vol. 133; pp. 213 - 227
Main Authors: Zhang, Ying, Liu, Qiang
Format: Journal Article
Language:English
Published: Elsevier B.V 01.08.2022
Subjects:
ISSN:0167-739X, 1872-7115
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Internet of things (IoT) security is a prerequisite for the rapid development of the IoT to enhance human well-being. Machine learning-based intrusion detection systems (IDS) have good protection capabilities. However, it is difficult to identify attack information in massive amounts of data, which leads to inefficient model detection when faced with insufficient samples for certain types of attacks. In this regard, this paper fuses deep learning methods and statistical ideas to address the problem of minority samples attack detection, and proposes an intrusion detection method for the IoT based on Improved Conditional Variational Autoencoder (ICVAE) and Borderline Synthetic Minority Oversampling Technique (BSM), called ICVAE-BSM. By introducing an auxiliary network into the Conditional Variational Autoencoder (CVAE) to adjust the output probability distribution of the encoder, learning the posterior distribution of different classes of samples, so that the distributions of samples of the same class are concentrated, and the distributions of different classes of samples are scattered in the latent space; then based on BSM, adaptively synthesize the edge latent variables in the latent space of ICVAE, and feed the new synthetic edge latent variables to the ICVAE’s decoder to generate representative new samples to balance the data set. The output of the encoder is connected to the Softmax classifier at last, and the original samples are mixed with the generated samples to fine-tune it to enhance its generalization ability for intrusion detection of minority samples. We use the NSL-KDD data set, CIC-IDS2017 data set and CSE-CIC-IDS2018 data set to simulate and evaluate the model, the experimental results show that the proposed method can more effectively improve the accuracy of IoT attack detection under the condition of unbalanced samples. •Different from the traditional unsupervised discriminant model dimensionality reduction, the feature extraction is realized through the supervised generative model ICVAE, and the prior information is given to adjust the probability distribution of the data, avoiding the homogeneity in a single mode.•The proposed feature extraction model introduces an auxiliary network, which makes the probability distribution of the encoder have a pointing effect. The posterior distribution of the sample is close to its exclusive distribution, and alleviates KL-vanishing problem during the CVAE training process.•The proposed scheme has obvious boundaries in the potential space distribution after dimensionality reduction. Based on BSM, the boundary examples in the potential space are oversampled and fed to the decoder to balance the data set, avoiding the generation of noise samples and redundant samples to waste resources and affect the model detection efficiency.•The proposed model can effectively detect minority intrusions from unbalanced samples of IoT.
ISSN:0167-739X
1872-7115
DOI:10.1016/j.future.2022.03.007