Research on Abnormal Traffic Detection in Industrial Control Network Based on CVAE-CatBoost

For the detection of abnormal traffic in Industrial Control Network(ICN),a new abnormal traffic detection model based on Conditional Variational Autoencoder(CVAE) and the Categorical Features Gradient Boosting(CatBoost) algorithm is proposed to address the problems of unbalanced data distribution an...

Full description

Saved in:
Bibliographic Details
Published in:Ji suan ji gong cheng Vol. 49; no. 5; pp. 173 - 180
Main Author: ZHANG Zixuan, ZONG Xuejun, HE Kan, LIAN Lian
Format: Journal Article
Language:Chinese
English
Published: Editorial Office of Computer Engineering 01.05.2023
Subjects:
ISSN:1000-3428
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:For the detection of abnormal traffic in Industrial Control Network(ICN),a new abnormal traffic detection model based on Conditional Variational Autoencoder(CVAE) and the Categorical Features Gradient Boosting(CatBoost) algorithm is proposed to address the problems of unbalanced data distribution and low detection rate in existing models.CVAE uses label information as a constraint to control the category of generated samples.The CatBoost algorithm overcomes gradient bias by introducing unbiased estimation,improves prediction accuracy,and reduces risk of overfitting by adopting various tree growth modes.CVAE is used to enhance data,expand rare attack samples,and build balanced datasets with uniform distribution.The CatBoost algorithm is an anomaly traffic detection model which accurately identifies attack samples,such as Dos,Fuzzers,and outputs the classification results.The experimental results show that on the UNSW-NB15 dataset,after data enhancement using CVAE,CatBoost improves the F1 value by 25.16 percentage points on average,whereby the overall precision,recall,and F1 value,reach 87.85%,87.87%,and 87.86%,respectively;on the ZYELL_NCTU NetTraffic_1.0 dataset,after using CVAE to enhance the data,CatBoost improves the F1 value by 16.32% on average,and the overall precision,recall,and F1 value,reach 99.85%.The proposed model can effectively avoid data imbalance problems and has better detection performance and generalization ability than machine learning and deep learning algorithms,such as K-Nearest Neighbor(KNN),Random Forest(RF),and Convolution Neural Network(CNN).
ISSN:1000-3428
DOI:10.19678/j.issn.1000-3428.0065478