SABLM-VD: Vulnerability detection with a semantic-aware binary language model.

Uloženo v:
Podrobná bibliografie
Název: SABLM-VD: Vulnerability detection with a semantic-aware binary language model.
Autoři: Li, Qinghao1 (AUTHOR), Liu, Tieming1 (AUTHOR), Liu, Wei1 (AUTHOR), Tang, Yonghe1 (AUTHOR), Liu, Chunling1 (AUTHOR), Dong, Weiyu1 (AUTHOR) dongxinbaoer@163.com
Zdroj: Information & Software Technology. Feb2026, Vol. 190, pN.PAG-N.PAG. 1p.
Témata: *COMPUTER security, BINARY codes, DEEP learning, LANGUAGE models, PENETRATION testing (Computer security)
Abstrakt: Static detection of binary code vulnerabilities based on deep learning is an important research field in computer security. In existing methods, those mainly based on code similarity often focus on overall similarity while neglecting crucial and subtle semantic differences in the code, leading to potential false positives. Meanwhile, methods mainly based on vulnerability patterns still face challenges in learning semantic features. To address the above problems, we conduct research on x86-64 and ARM architecture binaries and propose a hybrid-granularity assembly language tokenization method. Then, we propose a BERT-based semantic-aware binary language model, SABLM. It is a solution that simultaneously embeds the data transfer semantics, arithmetic logic semantics, and control flow of assembly instructions into a BERT-based language model, effectively perceiving the semantics of assembly code. Based on the semantic feature representation of assembly code by SABLM, we incorporate pseudocode features and construct a cross-architecture vulnerability detection framework, SABLM-VD. We evaluate SABLM-VD on the NVD dataset and the SARD dataset. The results show that SABLM-VD outperforms the state-of-the-art baseline methods in terms of F1-score, precision, recall, and accuracy, with SABLM-VD achieving F1-scores of 79.33% on NVD (Mixed), 100.00% on SARD (x86-64), and 100.00% on SARD (ARM). Ablation studies and component analysis demonstrate the effectiveness of each component of SABLM-VD. Visualizations and real-world applications further confirm the advantages of SABLM-VD. Our research indicates that SABLM-VD, which is based on the semantic-aware binary language model SABLM and pseudocode features, can effectively detect binary code vulnerabilities, warranting further research in this direction. [ABSTRACT FROM AUTHOR]
Copyright of Information & Software Technology is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze: Business Source Index
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:bsx&genre=article&issn=09505849&ISBN=&volume=190&issue=&date=20260201&spage=N.PAG&pages=&title=Information & Software Technology&atitle=SABLM-VD%3A%20Vulnerability%20detection%20with%20a%20semantic-aware%20binary%20language%20model.&aulast=Li%2C%20Qinghao&id=DOI:10.1016/j.infsof.2025.107959
    Name: Full Text Finder
    Category: fullText
    Text: Full Text Finder
    Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif
    MouseOverText: Full Text Finder
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Li%20Q
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: bsx
DbLabel: Business Source Index
An: 189790933
RelevancyScore: 1338
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1337.99206542969
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: SABLM-VD: Vulnerability detection with a semantic-aware binary language model.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Li%2C+Qinghao%22">Li, Qinghao</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Liu%2C+Tieming%22">Liu, Tieming</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Liu%2C+Wei%22">Liu, Wei</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Tang%2C+Yonghe%22">Tang, Yonghe</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Liu%2C+Chunling%22">Liu, Chunling</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Dong%2C+Weiyu%22">Dong, Weiyu</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> dongxinbaoer@163.com</i>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Information+%26+Software+Technology%22">Information & Software Technology</searchLink>. Feb2026, Vol. 190, pN.PAG-N.PAG. 1p.
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: *<searchLink fieldCode="DE" term="%22COMPUTER+security%22">COMPUTER security</searchLink><br /><searchLink fieldCode="DE" term="%22BINARY+codes%22">BINARY codes</searchLink><br /><searchLink fieldCode="DE" term="%22DEEP+learning%22">DEEP learning</searchLink><br /><searchLink fieldCode="DE" term="%22LANGUAGE+models%22">LANGUAGE models</searchLink><br /><searchLink fieldCode="DE" term="%22PENETRATION+testing+%28Computer+security%29%22">PENETRATION testing (Computer security)</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Static detection of binary code vulnerabilities based on deep learning is an important research field in computer security. In existing methods, those mainly based on code similarity often focus on overall similarity while neglecting crucial and subtle semantic differences in the code, leading to potential false positives. Meanwhile, methods mainly based on vulnerability patterns still face challenges in learning semantic features. To address the above problems, we conduct research on x86-64 and ARM architecture binaries and propose a hybrid-granularity assembly language tokenization method. Then, we propose a BERT-based semantic-aware binary language model, SABLM. It is a solution that simultaneously embeds the data transfer semantics, arithmetic logic semantics, and control flow of assembly instructions into a BERT-based language model, effectively perceiving the semantics of assembly code. Based on the semantic feature representation of assembly code by SABLM, we incorporate pseudocode features and construct a cross-architecture vulnerability detection framework, SABLM-VD. We evaluate SABLM-VD on the NVD dataset and the SARD dataset. The results show that SABLM-VD outperforms the state-of-the-art baseline methods in terms of F1-score, precision, recall, and accuracy, with SABLM-VD achieving F1-scores of 79.33% on NVD (Mixed), 100.00% on SARD (x86-64), and 100.00% on SARD (ARM). Ablation studies and component analysis demonstrate the effectiveness of each component of SABLM-VD. Visualizations and real-world applications further confirm the advantages of SABLM-VD. Our research indicates that SABLM-VD, which is based on the semantic-aware binary language model SABLM and pseudocode features, can effectively detect binary code vulnerabilities, warranting further research in this direction. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Information & Software Technology is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=bsx&AN=189790933
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1016/j.infsof.2025.107959
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 1
        StartPage: N.PAG
    Subjects:
      – SubjectFull: COMPUTER security
        Type: general
      – SubjectFull: BINARY codes
        Type: general
      – SubjectFull: DEEP learning
        Type: general
      – SubjectFull: LANGUAGE models
        Type: general
      – SubjectFull: PENETRATION testing (Computer security)
        Type: general
    Titles:
      – TitleFull: SABLM-VD: Vulnerability detection with a semantic-aware binary language model.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Li, Qinghao
      – PersonEntity:
          Name:
            NameFull: Liu, Tieming
      – PersonEntity:
          Name:
            NameFull: Liu, Wei
      – PersonEntity:
          Name:
            NameFull: Tang, Yonghe
      – PersonEntity:
          Name:
            NameFull: Liu, Chunling
      – PersonEntity:
          Name:
            NameFull: Dong, Weiyu
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 02
              Text: Feb2026
              Type: published
              Y: 2026
          Identifiers:
            – Type: issn-print
              Value: 09505849
          Numbering:
            – Type: volume
              Value: 190
          Titles:
            – TitleFull: Information & Software Technology
              Type: main
ResultId 1