In EDS ansehen

SABLM-VD: Vulnerability detection with a semantic-aware binary language model.

Gespeichert in:

Bibliographische Detailangaben
Titel:	SABLM-VD: Vulnerability detection with a semantic-aware binary language model.
Autoren:	Li, Qinghao¹ (AUTHOR), Liu, Tieming¹ (AUTHOR), Liu, Wei¹ (AUTHOR), Tang, Yonghe¹ (AUTHOR), Liu, Chunling¹ (AUTHOR), Dong, Weiyu¹ (AUTHOR) dongxinbaoer@163.com
Quelle:	Information & Software Technology. Feb2026, Vol. 190, pN.PAG-N.PAG. 1p.
Schlagwörter:	*COMPUTER security, BINARY codes, DEEP learning, LANGUAGE models, PENETRATION testing (Computer security)
Abstract:	Static detection of binary code vulnerabilities based on deep learning is an important research field in computer security. In existing methods, those mainly based on code similarity often focus on overall similarity while neglecting crucial and subtle semantic differences in the code, leading to potential false positives. Meanwhile, methods mainly based on vulnerability patterns still face challenges in learning semantic features. To address the above problems, we conduct research on x86-64 and ARM architecture binaries and propose a hybrid-granularity assembly language tokenization method. Then, we propose a BERT-based semantic-aware binary language model, SABLM. It is a solution that simultaneously embeds the data transfer semantics, arithmetic logic semantics, and control flow of assembly instructions into a BERT-based language model, effectively perceiving the semantics of assembly code. Based on the semantic feature representation of assembly code by SABLM, we incorporate pseudocode features and construct a cross-architecture vulnerability detection framework, SABLM-VD. We evaluate SABLM-VD on the NVD dataset and the SARD dataset. The results show that SABLM-VD outperforms the state-of-the-art baseline methods in terms of F1-score, precision, recall, and accuracy, with SABLM-VD achieving F1-scores of 79.33% on NVD (Mixed), 100.00% on SARD (x86-64), and 100.00% on SARD (ARM). Ablation studies and component analysis demonstrate the effectiveness of each component of SABLM-VD. Visualizations and real-world applications further confirm the advantages of SABLM-VD. Our research indicates that SABLM-VD, which is based on the semantic-aware binary language model SABLM and pseudocode features, can effectively detect binary code vulnerabilities, warranting further research in this direction. [ABSTRACT FROM AUTHOR]
	Copyright of Information & Software Technology is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Datenbank:	Business Source Index

Full Text Finder

Nájsť tento článok vo Web of Science

Beschreibung
Abstract:	Static detection of binary code vulnerabilities based on deep learning is an important research field in computer security. In existing methods, those mainly based on code similarity often focus on overall similarity while neglecting crucial and subtle semantic differences in the code, leading to potential false positives. Meanwhile, methods mainly based on vulnerability patterns still face challenges in learning semantic features. To address the above problems, we conduct research on x86-64 and ARM architecture binaries and propose a hybrid-granularity assembly language tokenization method. Then, we propose a BERT-based semantic-aware binary language model, SABLM. It is a solution that simultaneously embeds the data transfer semantics, arithmetic logic semantics, and control flow of assembly instructions into a BERT-based language model, effectively perceiving the semantics of assembly code. Based on the semantic feature representation of assembly code by SABLM, we incorporate pseudocode features and construct a cross-architecture vulnerability detection framework, SABLM-VD. We evaluate SABLM-VD on the NVD dataset and the SARD dataset. The results show that SABLM-VD outperforms the state-of-the-art baseline methods in terms of F1-score, precision, recall, and accuracy, with SABLM-VD achieving F1-scores of 79.33% on NVD (Mixed), 100.00% on SARD (x86-64), and 100.00% on SARD (ARM). Ablation studies and component analysis demonstrate the effectiveness of each component of SABLM-VD. Visualizations and real-world applications further confirm the advantages of SABLM-VD. Our research indicates that SABLM-VD, which is based on the semantic-aware binary language model SABLM and pseudocode features, can effectively detect binary code vulnerabilities, warranting further research in this direction. [ABSTRACT FROM AUTHOR]
ISSN:	09505849
DOI:	10.1016/j.infsof.2025.107959