Malware detection framework based on graph variational autoencoder extracted embeddings from API-call graphs

Malware harms the confidentiality and integrity of the information that causes material and moral damages to institutions or individuals. This study proposed a malware detection model based on API-call graphs and used Graph Variational Autoencoder (GVAE) to reduce the size of graph node features ext...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	PeerJ. Computer science Ročník 8; s. e988
Hlavní autor:	Gunduz, Hakan
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	United States PeerJ. Ltd 18.05.2022 PeerJ, Inc PeerJ Inc
Témata:	API-call graphs Application programming interface Artificial Intelligence Data Mining and Machine Learning Feature extraction Feature selection Graph embeddings Graph Variational Autoencoder Graphs Machine learning Malware Malware detection Mobile and Ubiquitous Computing Model accuracy Neural Networks Recursive Feature Elimination Security and Privacy Spyware API-call graphs Graph embeddings Recursive Feature Elimination Malware detection Graph Variational Autoencoder
ISSN:	2376-5992, 2376-5992
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Malware harms the confidentiality and integrity of the information that causes material and moral damages to institutions or individuals. This study proposed a malware detection model based on API-call graphs and used Graph Variational Autoencoder (GVAE) to reduce the size of graph node features extracted from Android apk files. GVAE-reduced embeddings were fed to linear-based (SVM) and ensemble-based (LightGBM) models to finalize the malware detection process. To validate the effectiveness of the GVAE-reduced features, recursive feature elimination (RFE) and Fisher score (FS) were applied to select informative feature sets with the same sizes as GVAE-reduced embeddings. The results with RFE and FS selections revealed that LightGBM and RFE-selected 50 features achieved the highest accuracy (0.907) and F-measure (0.852) rates. When we used GVAE-reduced embeddings in the classification, there was an approximate increase of %4 in both models’ accuracy rates. The same performance increase occurred in F-measure rates which directly indicated the improvement in the discrimination powers of the models. The last conducted experiment that combined the strengths of RFE selection and GVAE led to a performance increase compared to only GVAE-reduced embeddings. RFE selection achieved an accuracy rate of 0.967 in LightGBM with the help of selected 30 relevant features from the combination of all GVAE-embeddings.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2376-5992 2376-5992
DOI:	10.7717/peerj-cs.988