A Universal Malicious Documents Static Detection Framework Based on Feature Generalization

In this study, Portable Document Format (PDF), Word, Excel, Rich Test format (RTF) and image documents are taken as the research objects to study a static and fast method by which to detect malicious documents. Malicious PDF and Word document features are abstracted and extended, which can be used t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied sciences Jg. 11; H. 24; S. 12134
Hauptverfasser:	Lu, Xiaofeng, Wang, Fei, Jiang, Cheng, Lio, Pietro
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Basel MDPI AG 01.12.2021
Schlagworte:	Active learning Anti-virus software Design feature generalization JavaScript Keywords machine learning malicious document detection Methods static detection
ISSN:	2076-3417, 2076-3417
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this study, Portable Document Format (PDF), Word, Excel, Rich Test format (RTF) and image documents are taken as the research objects to study a static and fast method by which to detect malicious documents. Malicious PDF and Word document features are abstracted and extended, which can be used to detect other types of documents. A universal static detection framework for malicious documents based on feature generalization is then proposed. The generalized features include specification check errors, the structure path, code keywords, and the number of objects. The proposed method is verified on two datasets, and is compared with Kaspersky, NOD32, and McAfee antivirus software. The experimental results demonstrate that the proposed method achieves good performance in terms of the detection accuracy, runtime, and scalability. The average F1-score of all types of documents is found to be 0.99, and the average detection time of a document is 0.5926 s, which is at the same level as the compared antivirus software.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app112412134