GBsim: A Robust GCN-BERT Approach for Cross-Architecture Binary Code Similarity Analysis.

Uložené v:
Podrobná bibliografia
Názov: GBsim: A Robust GCN-BERT Approach for Cross-Architecture Binary Code Similarity Analysis.
Autori: Du, Jiang, Wei, Qiang, Wang, Yisen, Bai, Xingyu
Zdroj: Entropy; Apr2025, Vol. 27 Issue 4, p392, 23p
Predmety: GRAPH neural networks, NATURAL language processing, LANGUAGE models, BINARY codes, DEEP learning
Abstrakt: Recent advances in graph neural networks have transformed structural pattern learning in domains ranging from social network analysis to biomolecular modeling. Nevertheless, practical deployments in mission-critical scenarios such as binary code similarity detection face two fundamental obstacles: first, the inherent noise in graph construction processes exemplified by incomplete control flow edges during binary function recovery; second, the substantial distribution discrepancies caused by cross-architecture instruction set variations. Conventional GNN architectures demonstrate severe performance degradation under such low signal-to-noise ratio conditions and cross-domain operational environments, particularly in security-sensitive vulnerability identification tasks where feature instability or domain shifts could trigger critical false judgments. To address these challenges, we propose GBsim, a novel approach that combines graph neural networks with natural language processing. GBsim employs a cross-architecture language model to transform binary functions into semantic graphs, leverages a multilayer GCN for structural feature extraction, and employs a Transformer layer to integrate semantic information, generates robust cross-architecture embeddings that maintain high performance despite significant distribution shifts. Extensive experiments on a large-scale cross-architecture dataset show that GBsim achieves an MRR of 0.901 and a Recall@1 of 0.831, outperforming state-of-the-art methods. In real-world vulnerability detection tasks, GBsim achieves an average recall rate of 81.3% on a 1-day vulnerability dataset, demonstrating its practical effectiveness in identifying security threats and outperforming existing methods by 2.1%. This performance advantage stems from GBsim's ability to maximize information preservation across architectural boundaries, enhancing model robustness in the presence of noise and distribution shifts. [ABSTRACT FROM AUTHOR]
Copyright of Entropy is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáza: Complementary Index
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=pmc&term=1099-4300[TA]+AND+392[PG]+AND+2025[PDAT]
    Name: FREE - PubMed Central (ISSN based link)
    Category: fullText
    Text: Full Text
    Icon: https://imageserver.ebscohost.com/NetImages/iconPdf.gif
    MouseOverText: Check this PubMed for the article full text.
  – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=10994300&ISBN=&volume=27&issue=4&date=20250401&spage=392&pages=392-414&title=Entropy&atitle=GBsim%3A%20A%20Robust%20GCN-BERT%20Approach%20for%20Cross-Architecture%20Binary%20Code%20Similarity%20Analysis.&aulast=Du%2C%20Jiang&id=DOI:10.3390/e27040392
    Name: Full Text Finder
    Category: fullText
    Text: Full Text Finder
    Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif
    MouseOverText: Full Text Finder
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Du%20J
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edb
DbLabel: Complementary Index
An: 184749359
RelevancyScore: 1023
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1023.07330322266
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: GBsim: A Robust GCN-BERT Approach for Cross-Architecture Binary Code Similarity Analysis.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Du%2C+Jiang%22">Du, Jiang</searchLink><br /><searchLink fieldCode="AR" term="%22Wei%2C+Qiang%22">Wei, Qiang</searchLink><br /><searchLink fieldCode="AR" term="%22Wang%2C+Yisen%22">Wang, Yisen</searchLink><br /><searchLink fieldCode="AR" term="%22Bai%2C+Xingyu%22">Bai, Xingyu</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: Entropy; Apr2025, Vol. 27 Issue 4, p392, 23p
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22GRAPH+neural+networks%22">GRAPH neural networks</searchLink><br /><searchLink fieldCode="DE" term="%22NATURAL+language+processing%22">NATURAL language processing</searchLink><br /><searchLink fieldCode="DE" term="%22LANGUAGE+models%22">LANGUAGE models</searchLink><br /><searchLink fieldCode="DE" term="%22BINARY+codes%22">BINARY codes</searchLink><br /><searchLink fieldCode="DE" term="%22DEEP+learning%22">DEEP learning</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Recent advances in graph neural networks have transformed structural pattern learning in domains ranging from social network analysis to biomolecular modeling. Nevertheless, practical deployments in mission-critical scenarios such as binary code similarity detection face two fundamental obstacles: first, the inherent noise in graph construction processes exemplified by incomplete control flow edges during binary function recovery; second, the substantial distribution discrepancies caused by cross-architecture instruction set variations. Conventional GNN architectures demonstrate severe performance degradation under such low signal-to-noise ratio conditions and cross-domain operational environments, particularly in security-sensitive vulnerability identification tasks where feature instability or domain shifts could trigger critical false judgments. To address these challenges, we propose GBsim, a novel approach that combines graph neural networks with natural language processing. GBsim employs a cross-architecture language model to transform binary functions into semantic graphs, leverages a multilayer GCN for structural feature extraction, and employs a Transformer layer to integrate semantic information, generates robust cross-architecture embeddings that maintain high performance despite significant distribution shifts. Extensive experiments on a large-scale cross-architecture dataset show that GBsim achieves an MRR of 0.901 and a Recall@1 of 0.831, outperforming state-of-the-art methods. In real-world vulnerability detection tasks, GBsim achieves an average recall rate of 81.3% on a 1-day vulnerability dataset, demonstrating its practical effectiveness in identifying security threats and outperforming existing methods by 2.1%. This performance advantage stems from GBsim's ability to maximize information preservation across architectural boundaries, enhancing model robustness in the presence of noise and distribution shifts. [ABSTRACT FROM AUTHOR]
– Name: Abstract
  Label:
  Group: Ab
  Data: <i>Copyright of Entropy is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=184749359
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.3390/e27040392
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 23
        StartPage: 392
    Subjects:
      – SubjectFull: GRAPH neural networks
        Type: general
      – SubjectFull: NATURAL language processing
        Type: general
      – SubjectFull: LANGUAGE models
        Type: general
      – SubjectFull: BINARY codes
        Type: general
      – SubjectFull: DEEP learning
        Type: general
    Titles:
      – TitleFull: GBsim: A Robust GCN-BERT Approach for Cross-Architecture Binary Code Similarity Analysis.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Du, Jiang
      – PersonEntity:
          Name:
            NameFull: Wei, Qiang
      – PersonEntity:
          Name:
            NameFull: Wang, Yisen
      – PersonEntity:
          Name:
            NameFull: Bai, Xingyu
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 04
              Text: Apr2025
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 10994300
          Numbering:
            – Type: volume
              Value: 27
            – Type: issue
              Value: 4
          Titles:
            – TitleFull: Entropy
              Type: main
ResultId 1