Zobrazit v EDS

Toward a Robust Detection of PowerShell Malware against Code Mixing and Obfuscation by Using Sentence Transformer and Similarity Learning.

Uloženo v:

Podrobná bibliografie
Název:	Toward a Robust Detection of PowerShell Malware against Code Mixing and Obfuscation by Using Sentence Transformer and Similarity Learning.
Autoři:	Fu, Zhiwei, Song, Leo, Ding, Steven, Alaca, Furkan, Acharya, Sudipta
Zdroj:	ACM Transactions on Privacy & Security; Nov2025, Vol. 28 Issue 4, p1-23, 23p
Témata:	MALWARE, MACHINE learning, PATTERN perception, MALWARE prevention, SCRIPTING languages (Computer science), ARTIFICIAL neural networks
Abstrakt:	Embedded PowerShell commands or scripts are among the most popular malware payloads. For malware that prioritizes stealthiness, such as fileless malware, PowerShell's access to Windows API functions without additional libraries makes it useful for evading detection. Detecting malicious PowerShell scripts and commands is an open challenge for proactive endpoint protection due to three major issues: (1) The malicious commands are usually hidden in a long script beyond the processing limit of typical machine learning models. (2) They are usually mixed with bulky benign scripts. (3) Script obfuscation can easily conceal their potential matching signatures. In this article, we introduce a novel model addressing these challenges. It incorporates similarity learning, sentence transformer, sliding window method, and stochastic gradient descent (SGD) classifier. Our key insight is that malicious PowerShell code, particularly when obfuscated, exhibits semantic and statistical deviations from benign administrative usage, and these deviations can be captured by contrastive sentence embeddings without the need for de-obfuscation or handcrafted features. We operate this insight through a Siamese similarity learning framework that improves robustness against Out-of-Vocabulary tokens due to unseen code obfuscation methods. The sliding window method enables the model to handle long scripts, and the SGD classifier evaluates segment-level maliciousness. Our model achieves accuracies of 99.01%, 97.59%, 98.70%, and 99.73% across multiple obfuscated and mixed script benchmarks, outperforming existing baselines by over 30% in all cases. This work demonstrates a scalable and effective strategy for robust PowerShell malware detection in real-world scenarios. [ABSTRACT FROM AUTHOR]
	Copyright of ACM Transactions on Privacy & Security is the property of Association for Computing Machinery and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze:	Complementary Index

Full Text Finder

Nájsť tento článok vo Web of Science

Popis
Abstrakt:	Embedded PowerShell commands or scripts are among the most popular malware payloads. For malware that prioritizes stealthiness, such as fileless malware, PowerShell's access to Windows API functions without additional libraries makes it useful for evading detection. Detecting malicious PowerShell scripts and commands is an open challenge for proactive endpoint protection due to three major issues: (1) The malicious commands are usually hidden in a long script beyond the processing limit of typical machine learning models. (2) They are usually mixed with bulky benign scripts. (3) Script obfuscation can easily conceal their potential matching signatures. In this article, we introduce a novel model addressing these challenges. It incorporates similarity learning, sentence transformer, sliding window method, and stochastic gradient descent (SGD) classifier. Our key insight is that malicious PowerShell code, particularly when obfuscated, exhibits semantic and statistical deviations from benign administrative usage, and these deviations can be captured by contrastive sentence embeddings without the need for de-obfuscation or handcrafted features. We operate this insight through a Siamese similarity learning framework that improves robustness against Out-of-Vocabulary tokens due to unseen code obfuscation methods. The sliding window method enables the model to handle long scripts, and the SGD classifier evaluates segment-level maliciousness. Our model achieves accuracies of 99.01%, 97.59%, 98.70%, and 99.73% across multiple obfuscated and mixed script benchmarks, outperforming existing baselines by over 30% in all cases. This work demonstrates a scalable and effective strategy for robust PowerShell malware detection in real-world scenarios. [ABSTRACT FROM AUTHOR]
ISSN:	24712566
DOI:	10.1145/3771542