Handling partially labeled network data: A semi-supervised approach using stacked sparse autoencoder

Network traffic analytics has become a crucial task in order to better understand and manage network resources, especially in the network softwarization era where the implementation of this concept can be done easily with network function virtualization. Currently, many approaches have been proposed...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computer networks (Amsterdam, Netherlands : 1999) Ročník 207; s. 108742 - 12
Hlavní autoři: Aouedi, Ons, Piamrat, Kandaraj, Bagadthey, Dhruvjyoti
Médium: Journal Article
Jazyk:angličtina
Vydáno: Amsterdam Elsevier B.V 22.04.2022
Elsevier Sequoia S.A
Elsevier
Témata:
ISSN:1389-1286, 1872-7069
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Network traffic analytics has become a crucial task in order to better understand and manage network resources, especially in the network softwarization era where the implementation of this concept can be done easily with network function virtualization. Currently, many approaches have been proposed to improve the performance of traffic classification. However, as new types of traffic emerge every day (and they are generally not labeled), this opens a new challenge to be handled. Moreover, the question of how to accurately classify the traffic using a limited amount of labeled data or partially labeled data is also another important concern. In fact, labeling data is often difficult and time-consuming. In order to solve the previously described issues, we reformulate traffic classification into a semi-supervised learning where both supervised learning (using labeled data) and unsupervised learning (no label data) are combined. To do so, this paper presents a stacked sparse autoencoder (SSAE) based semi-supervised deep-learning model for traffic classification. The main motivations of this approach are: (i) unlabeled data is often abundant and easily available; (ii) classification performance of the whole model can be greatly improved when a large amount of unlabeled traffic is included in the training process; (iii) there is a limit to how much human effort can be thrown at the labeling problem. To investigate the performance of our approach, an empirical study has been conducted on a real dataset and results indicate that using a large amount of unlabeled data in the SSAE pre-trained phase can improve significantly the classification performance of the whole model. Furthermore, the proposed approach is compared against other representative machine-learning and deep-learning models, which are Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Multi-Layer Perceptron (MLP), eXtreme Gradient Boosting (XGBoost), and Autoencoder.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1389-1286
1872-7069
DOI:10.1016/j.comnet.2021.108742