Ransomware classification using patch-based CNN and self-attention network on embedded N-grams of opcodes

Ransomware is a special kind of malware, which leads to irreversible data losses and incurs enormous economic costs. It is an urgent task to detect ransomware nowadays. Further, in order to achieve appropriate defenses and reduce analysts’ workloads, ransomware must be not only detected, but also cl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Future generation computer systems Jg. 110; S. 708 - 720
Hauptverfasser: Zhang, Bin, Xiao, Wentao, Xiao, Xi, Sangaiah, Arun Kumar, Zhang, Weizhe, Zhang, Jiajia
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.09.2020
Schlagworte:
ISSN:0167-739X, 1872-7115
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Ransomware is a special kind of malware, which leads to irreversible data losses and incurs enormous economic costs. It is an urgent task to detect ransomware nowadays. Further, in order to achieve appropriate defenses and reduce analysts’ workloads, ransomware must be not only detected, but also classified into families. Some ransomware, e.g., fingerprinting ransomware, can fingerprint the run-time environment and evade dynamic analysis. To detect this type of ransomware and speed up the processing in comparison to dynamic analyses, we propose a static analysis framework based on N-gram opcodes with deep learning. Since opcode sequences obtained from executable files have rich context and semantic information, we view the opcode sequence from a natural language sentences perspective. However, the lengths of the N-gram opcode sequences are widely different, ranging from hundreds to millions. Among them, the extremely long sequences are far beyond the ability of most of the deep neural network based sequence classifier, such as RNN. To address this problem and enhance the scalability of our framework, we partition the N-gram sequence into many patches and feed each patch into a self-attention based convolution neural network named SA-CNN. Subsequently, the outputs of SA-CNNs are concatenated and put into a bi-directional self-attention network to get the ransomware classification result. Compared with CNN and RNN, the self-attention mechanism exhibits the brilliant ability to capture complementary information of the distance-aware dependencies with high computational efficiency. To the best of our knowledge, we are the first to exploit self-attention mechanism on opcode sequences for ransomware classification. With the partition strategy and the power of the self-attention network, the framework captures rich context and semantic information from the extremely long sequence. The comprehensive experiments on a real-world dataset show that the proposed framework outperforms the state-of-the-art methods in many evaluations. •We construct a novel ransomware classification framework based on opcode sequence processed from static analysis. Our method can classify fingerprinting ransomware and speed up the processing in comparison to dynamic analysis.•Our framework scale well to extremely long opcode sequences.•We are the first to exploit self-attention on top of static analysis.
ISSN:0167-739X
1872-7115
DOI:10.1016/j.future.2019.09.025