SHAPE: A Simultaneous Header and Payload Encoding Model for Encrypted Traffic Classification

Many end-to-end deep learning algorithms seeking to classify malicious traffic and encrypted traffic have been proposed in recent years. End-to-end deep learning algorithms require a large number of samples to train a model. However, it is hard for existing methods fully utilizing the heterogeneous...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE eTransactions on network and service management Ročník 20; číslo 2; s. 1993 - 2012
Hlavní autori: Dai, Jianbang, Xu, Xiaolong, Gao, Honghao, Wang, Xinheng, Xiao, Fu
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York IEEE 01.06.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:1932-4537, 1932-4537
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Many end-to-end deep learning algorithms seeking to classify malicious traffic and encrypted traffic have been proposed in recent years. End-to-end deep learning algorithms require a large number of samples to train a model. However, it is hard for existing methods fully utilizing the heterogeneous multimodal input. To this end, we propose the SHAPE model (simultaneous header and payload encoding), which mainly consists of two autoencoders and a transformer layer, to improve model performance. The two auto encoders extract features from heterogeneous inputs-the statistical information of each packet and byte-form payloads-and convert them into a unified format; then, a lightweight Transformers layer further extracts the relationship hidden in simultaneous input. In particular, the autoencoder for payload feature extraction contains several depthwise separable residual convolution layers for efficient feature extraction and a token squeeze layer to reduce the computing overhead of the Transformers layer. Moreover, we train the SHAPE model using deep metric learning, which pulls samples with the same class label together and separates samples from different classes in the low-dimensional embedding space. Thus, the SHAPE model can naturally handle multitask classification, and its performance is approximately 5.43% better than the current SOTA on the traffic type classification of the ISCX-VPN2016 dataset, at the cost of 9.31 times the training time, and 1.45 times the inference time.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1932-4537
1932-4537
DOI:10.1109/TNSM.2022.3213758