Enhancing the insertion of NOP instructions to obfuscate malware via deep reinforcement learning

•It explores the vulnerability of classifiers against the dead code insertion technique.•It poposes a reinforcement learning framework to bypass malware classifers.•A Q-network selects the optimal positions to which insert the NOP instructions.•Using a time-distributed layer to determine the optimal...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers & security Jg. 113; S. 102543
Hauptverfasser: Gibert, Daniel, Fredrikson, Matt, Mateu, Carles, Planes, Jordi, Le, Quan
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Amsterdam Elsevier Ltd 01.02.2022
Elsevier Sequoia S.A
Schlagworte:
ISSN:0167-4048, 1872-6208
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•It explores the vulnerability of classifiers against the dead code insertion technique.•It poposes a reinforcement learning framework to bypass malware classifers.•A Q-network selects the optimal positions to which insert the NOP instructions.•Using a time-distributed layer to determine the optimal locations of the insertions.•Greatly improves the evasion rate in comparison to a random agent. Current state-of-the-art research for tackling the problem of malware detection and classification is centered on the design, implementation and deployment of systems powered by machine learning because of its ability to generalize to never-before-seen malware families and polymorphic mutations. However, it has been shown that machine learning models, in partidular deep neural networks, lack robustness against crafted inputs (adversarial examples). In this work, we have investigated the vulnerability of a state-of-the-art shallow convolutional neural network malware classifier against the deat code insertion technique. We propose a general framework powered by a Double Q-network to induce misclassification over malware families. The framework trains an agent through a convolutional neural network to select the optimal positions in a code sequence to insert dead code instructions so that the machine learning classifier mislabels the resulting executable. The experiments show that the proposed method significantly drops the classification accuracy of the classifier to 56.53% while having an evasion rate of 100% for the samples belonging to Kelihos_ver3, Simda, and Kelihos_ver1 families. In addition, the average number of instructions needed to mislabel malware in comparison to a random agent decreased by 33%.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0167-4048
1872-6208
DOI:10.1016/j.cose.2021.102543