Self-Training of Cyber-Threat Classification Model With Threat-Payload Centric Augmentation

Deep learning (DL)-based threat classification has been investigated for effective analysis of threat events to minimize the human's resources in security operation centers (SOC). However, human-labeling (HL) by SOC security analysts is still necessary for accurate classification and responses...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on industrial informatics Ročník 20; číslo 10; s. 11740 - 11750
Hlavní autori: Kim, Jae-Yeol, Kwon, Hyuk-Yoon
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Piscataway IEEE 01.10.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:1551-3203, 1941-0050
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Deep learning (DL)-based threat classification has been investigated for effective analysis of threat events to minimize the human's resources in security operation centers (SOC). However, human-labeling (HL) by SOC security analysts is still necessary for accurate classification and responses to the unknown threat events or new threat trends. This labeling process consumes significant time and effort, posing limitations in constructing an efficient SOC response system, especially for immediate responses to newly generated large-scale threats. To address this, we propose a new self-training method of threat classification model, PLC-TPA. We present a self-training pipeline based on pseudo-labeling with confidence (PLC) for automatic labeling of newly captured threats. To resolve the class imbalance during self-training, we present a novel threat-payload centric augmentation (TPA) method considering threat-payload characteristics. Through extensive experiments, we show that PLC-TPA achieves a high accuracy of threat classification about 0.973 to 0.988 of F1-score, which improves other self-training methods by 10.9% to 13.4%. Notably, PLC-TPA performs comparable even to HL with significantly faster response times. These findings suggest substantial improvements in DL-based SOC environments with the proposed PLC-TPA. PLC-TPA also outperforms the existing methods by 8.3% to 17.4% in comparative experiments.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1551-3203
1941-0050
DOI:10.1109/TII.2024.3413300