MalVis: Large-Scale Bytecode Visualization Framework for Explainable Android Malware Detection.

Uloženo v:
Podrobná bibliografie
Název: MalVis: Large-Scale Bytecode Visualization Framework for Explainable Android Malware Detection.
Autoři: Makkawy, Saleh J., De Lucia, Michael J., Barner, Kenneth E.
Zdroj: Journal of Cybersecurity & Privacy; Dec2025, Vol. 5 Issue 4, p109, 27p
Témata: MALWARE, SIGNAL detection, DEEP learning, N-gram models (Computational linguistics), ENTROPY
Reviews & Products: ANDROID (Operating system)
Abstrakt: As technology advances, developers continually create innovative solutions to enhance smartphone security. However, the rapid spread of Android malware poses significant threats to devices and sensitive data. The Android Operating System (OS)'s open-source nature and Software Development Kit (SDK) availability mainly contribute to this alarming growth. Conventional malware detection methods, such as signature-based, static, and dynamic analysis, face challenges in detecting obfuscated techniques, including encryption, packing, and compression, in malware. Although developers have created several visualization techniques for malware detection using deep learning (DL), they often fail to accurately identify the critical malicious features of malware. This research introduces MalVis, a unified visualization framework that integrates entropy and N-gram analysis to emphasize meaningful structural and anomalous operational patterns within the malware bytecode. By addressing significant limitations of existing visualization methods, such as insufficient feature representation, limited interpretability, small dataset sizes, and restricted data access, MalVis delivers enhanced detection capabilities, particularly for obfuscated and previously unseen (zero-day) malware. The framework leverages the MalVis dataset introduced in this work, a publicly available large-scale dataset comprising more than 1.3 million visual representations in nine malware classes and one benign class. A comprehensive comparative evaluation was performed against existing state-of-the-art visualization techniques using leading convolutional neural network (CNN) architectures, MobileNet-V2, DenseNet201, ResNet50, VGG16, and Inception-V3. To further boost classification performance and mitigate overfitting, the outputs of these models were combined using eight distinct ensemble strategies. To address the issue of imbalanced class distribution in the multiclass dataset, we employed an undersampling technique to ensure balanced learning across all types of malware. MalVis achieved superior results, with 95% accuracy, 90% F1-score, 92% precision, 89% recall, 87% Matthews Correlation Coefficient (MCC), and 98% Receiver Operating Characteristic Area Under Curve (ROC-AUC). These findings highlight the effectiveness of MalVis in providing interpretable and accurate representation features for malware detection and classification, making it valuable for research and real-world security applications. [ABSTRACT FROM AUTHOR]
Copyright of Journal of Cybersecurity & Privacy is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze: Complementary Index
Popis
Abstrakt:As technology advances, developers continually create innovative solutions to enhance smartphone security. However, the rapid spread of Android malware poses significant threats to devices and sensitive data. The Android Operating System (OS)'s open-source nature and Software Development Kit (SDK) availability mainly contribute to this alarming growth. Conventional malware detection methods, such as signature-based, static, and dynamic analysis, face challenges in detecting obfuscated techniques, including encryption, packing, and compression, in malware. Although developers have created several visualization techniques for malware detection using deep learning (DL), they often fail to accurately identify the critical malicious features of malware. This research introduces MalVis, a unified visualization framework that integrates entropy and N-gram analysis to emphasize meaningful structural and anomalous operational patterns within the malware bytecode. By addressing significant limitations of existing visualization methods, such as insufficient feature representation, limited interpretability, small dataset sizes, and restricted data access, MalVis delivers enhanced detection capabilities, particularly for obfuscated and previously unseen (zero-day) malware. The framework leverages the MalVis dataset introduced in this work, a publicly available large-scale dataset comprising more than 1.3 million visual representations in nine malware classes and one benign class. A comprehensive comparative evaluation was performed against existing state-of-the-art visualization techniques using leading convolutional neural network (CNN) architectures, MobileNet-V2, DenseNet201, ResNet50, VGG16, and Inception-V3. To further boost classification performance and mitigate overfitting, the outputs of these models were combined using eight distinct ensemble strategies. To address the issue of imbalanced class distribution in the multiclass dataset, we employed an undersampling technique to ensure balanced learning across all types of malware. MalVis achieved superior results, with 95% accuracy, 90% F1-score, 92% precision, 89% recall, 87% Matthews Correlation Coefficient (MCC), and 98% Receiver Operating Characteristic Area Under Curve (ROC-AUC). These findings highlight the effectiveness of MalVis in providing interpretable and accurate representation features for malware detection and classification, making it valuable for research and real-world security applications. [ABSTRACT FROM AUTHOR]
ISSN:2624800X
DOI:10.3390/jcp5040109