SABLM-VD: Vulnerability detection with a semantic-aware binary language model.
Uloženo v:
| Název: | SABLM-VD: Vulnerability detection with a semantic-aware binary language model. |
|---|---|
| Autoři: | Li, Qinghao1 (AUTHOR), Liu, Tieming1 (AUTHOR), Liu, Wei1 (AUTHOR), Tang, Yonghe1 (AUTHOR), Liu, Chunling1 (AUTHOR), Dong, Weiyu1 (AUTHOR) dongxinbaoer@163.com |
| Zdroj: | Information & Software Technology. Feb2026, Vol. 190, pN.PAG-N.PAG. 1p. |
| Témata: | *COMPUTER security, BINARY codes, DEEP learning, LANGUAGE models, PENETRATION testing (Computer security) |
| Abstrakt: | Static detection of binary code vulnerabilities based on deep learning is an important research field in computer security. In existing methods, those mainly based on code similarity often focus on overall similarity while neglecting crucial and subtle semantic differences in the code, leading to potential false positives. Meanwhile, methods mainly based on vulnerability patterns still face challenges in learning semantic features. To address the above problems, we conduct research on x86-64 and ARM architecture binaries and propose a hybrid-granularity assembly language tokenization method. Then, we propose a BERT-based semantic-aware binary language model, SABLM. It is a solution that simultaneously embeds the data transfer semantics, arithmetic logic semantics, and control flow of assembly instructions into a BERT-based language model, effectively perceiving the semantics of assembly code. Based on the semantic feature representation of assembly code by SABLM, we incorporate pseudocode features and construct a cross-architecture vulnerability detection framework, SABLM-VD. We evaluate SABLM-VD on the NVD dataset and the SARD dataset. The results show that SABLM-VD outperforms the state-of-the-art baseline methods in terms of F1-score, precision, recall, and accuracy, with SABLM-VD achieving F1-scores of 79.33% on NVD (Mixed), 100.00% on SARD (x86-64), and 100.00% on SARD (ARM). Ablation studies and component analysis demonstrate the effectiveness of each component of SABLM-VD. Visualizations and real-world applications further confirm the advantages of SABLM-VD. Our research indicates that SABLM-VD, which is based on the semantic-aware binary language model SABLM and pseudocode features, can effectively detect binary code vulnerabilities, warranting further research in this direction. [ABSTRACT FROM AUTHOR] |
| Copyright of Information & Software Technology is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Databáze: | Business Source Index |
| FullText | Text: Availability: 0 CustomLinks: – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:bsx&genre=article&issn=09505849&ISBN=&volume=190&issue=&date=20260201&spage=N.PAG&pages=&title=Information & Software Technology&atitle=SABLM-VD%3A%20Vulnerability%20detection%20with%20a%20semantic-aware%20binary%20language%20model.&aulast=Li%2C%20Qinghao&id=DOI:10.1016/j.infsof.2025.107959 Name: Full Text Finder Category: fullText Text: Full Text Finder Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif MouseOverText: Full Text Finder – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Li%20Q Name: ISI Category: fullText Text: Nájsť tento článok vo Web of Science Icon: https://imagesrvr.epnet.com/ls/20docs.gif MouseOverText: Nájsť tento článok vo Web of Science |
|---|---|
| Header | DbId: bsx DbLabel: Business Source Index An: 189790933 RelevancyScore: 1338 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 1337.99206542969 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: SABLM-VD: Vulnerability detection with a semantic-aware binary language model. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Li%2C+Qinghao%22">Li, Qinghao</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Liu%2C+Tieming%22">Liu, Tieming</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Liu%2C+Wei%22">Liu, Wei</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Tang%2C+Yonghe%22">Tang, Yonghe</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Liu%2C+Chunling%22">Liu, Chunling</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Dong%2C+Weiyu%22">Dong, Weiyu</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> dongxinbaoer@163.com</i> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Information+%26+Software+Technology%22">Information & Software Technology</searchLink>. Feb2026, Vol. 190, pN.PAG-N.PAG. 1p. – Name: Subject Label: Subject Terms Group: Su Data: *<searchLink fieldCode="DE" term="%22COMPUTER+security%22">COMPUTER security</searchLink><br /><searchLink fieldCode="DE" term="%22BINARY+codes%22">BINARY codes</searchLink><br /><searchLink fieldCode="DE" term="%22DEEP+learning%22">DEEP learning</searchLink><br /><searchLink fieldCode="DE" term="%22LANGUAGE+models%22">LANGUAGE models</searchLink><br /><searchLink fieldCode="DE" term="%22PENETRATION+testing+%28Computer+security%29%22">PENETRATION testing (Computer security)</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Static detection of binary code vulnerabilities based on deep learning is an important research field in computer security. In existing methods, those mainly based on code similarity often focus on overall similarity while neglecting crucial and subtle semantic differences in the code, leading to potential false positives. Meanwhile, methods mainly based on vulnerability patterns still face challenges in learning semantic features. To address the above problems, we conduct research on x86-64 and ARM architecture binaries and propose a hybrid-granularity assembly language tokenization method. Then, we propose a BERT-based semantic-aware binary language model, SABLM. It is a solution that simultaneously embeds the data transfer semantics, arithmetic logic semantics, and control flow of assembly instructions into a BERT-based language model, effectively perceiving the semantics of assembly code. Based on the semantic feature representation of assembly code by SABLM, we incorporate pseudocode features and construct a cross-architecture vulnerability detection framework, SABLM-VD. We evaluate SABLM-VD on the NVD dataset and the SARD dataset. The results show that SABLM-VD outperforms the state-of-the-art baseline methods in terms of F1-score, precision, recall, and accuracy, with SABLM-VD achieving F1-scores of 79.33% on NVD (Mixed), 100.00% on SARD (x86-64), and 100.00% on SARD (ARM). Ablation studies and component analysis demonstrate the effectiveness of each component of SABLM-VD. Visualizations and real-world applications further confirm the advantages of SABLM-VD. Our research indicates that SABLM-VD, which is based on the semantic-aware binary language model SABLM and pseudocode features, can effectively detect binary code vulnerabilities, warranting further research in this direction. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Information & Software Technology is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=bsx&AN=189790933 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1016/j.infsof.2025.107959 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 1 StartPage: N.PAG Subjects: – SubjectFull: COMPUTER security Type: general – SubjectFull: BINARY codes Type: general – SubjectFull: DEEP learning Type: general – SubjectFull: LANGUAGE models Type: general – SubjectFull: PENETRATION testing (Computer security) Type: general Titles: – TitleFull: SABLM-VD: Vulnerability detection with a semantic-aware binary language model. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Li, Qinghao – PersonEntity: Name: NameFull: Liu, Tieming – PersonEntity: Name: NameFull: Liu, Wei – PersonEntity: Name: NameFull: Tang, Yonghe – PersonEntity: Name: NameFull: Liu, Chunling – PersonEntity: Name: NameFull: Dong, Weiyu IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 02 Text: Feb2026 Type: published Y: 2026 Identifiers: – Type: issn-print Value: 09505849 Numbering: – Type: volume Value: 190 Titles: – TitleFull: Information & Software Technology Type: main |
| ResultId | 1 |
Full Text Finder
Nájsť tento článok vo Web of Science