GBsim: A Robust GCN-BERT Approach for Cross-Architecture Binary Code Similarity Analysis.
Uložené v:
| Názov: | GBsim: A Robust GCN-BERT Approach for Cross-Architecture Binary Code Similarity Analysis. |
|---|---|
| Autori: | Du, Jiang, Wei, Qiang, Wang, Yisen, Bai, Xingyu |
| Zdroj: | Entropy; Apr2025, Vol. 27 Issue 4, p392, 23p |
| Predmety: | GRAPH neural networks, NATURAL language processing, LANGUAGE models, BINARY codes, DEEP learning |
| Abstrakt: | Recent advances in graph neural networks have transformed structural pattern learning in domains ranging from social network analysis to biomolecular modeling. Nevertheless, practical deployments in mission-critical scenarios such as binary code similarity detection face two fundamental obstacles: first, the inherent noise in graph construction processes exemplified by incomplete control flow edges during binary function recovery; second, the substantial distribution discrepancies caused by cross-architecture instruction set variations. Conventional GNN architectures demonstrate severe performance degradation under such low signal-to-noise ratio conditions and cross-domain operational environments, particularly in security-sensitive vulnerability identification tasks where feature instability or domain shifts could trigger critical false judgments. To address these challenges, we propose GBsim, a novel approach that combines graph neural networks with natural language processing. GBsim employs a cross-architecture language model to transform binary functions into semantic graphs, leverages a multilayer GCN for structural feature extraction, and employs a Transformer layer to integrate semantic information, generates robust cross-architecture embeddings that maintain high performance despite significant distribution shifts. Extensive experiments on a large-scale cross-architecture dataset show that GBsim achieves an MRR of 0.901 and a Recall@1 of 0.831, outperforming state-of-the-art methods. In real-world vulnerability detection tasks, GBsim achieves an average recall rate of 81.3% on a 1-day vulnerability dataset, demonstrating its practical effectiveness in identifying security threats and outperforming existing methods by 2.1%. This performance advantage stems from GBsim's ability to maximize information preservation across architectural boundaries, enhancing model robustness in the presence of noise and distribution shifts. [ABSTRACT FROM AUTHOR] |
| Copyright of Entropy is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Databáza: | Complementary Index |
| FullText | Text: Availability: 0 CustomLinks: – Url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=pmc&term=1099-4300[TA]+AND+392[PG]+AND+2025[PDAT] Name: FREE - PubMed Central (ISSN based link) Category: fullText Text: Full Text Icon: https://imageserver.ebscohost.com/NetImages/iconPdf.gif MouseOverText: Check this PubMed for the article full text. – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=10994300&ISBN=&volume=27&issue=4&date=20250401&spage=392&pages=392-414&title=Entropy&atitle=GBsim%3A%20A%20Robust%20GCN-BERT%20Approach%20for%20Cross-Architecture%20Binary%20Code%20Similarity%20Analysis.&aulast=Du%2C%20Jiang&id=DOI:10.3390/e27040392 Name: Full Text Finder Category: fullText Text: Full Text Finder Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif MouseOverText: Full Text Finder – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Du%20J Name: ISI Category: fullText Text: Nájsť tento článok vo Web of Science Icon: https://imagesrvr.epnet.com/ls/20docs.gif MouseOverText: Nájsť tento článok vo Web of Science |
|---|---|
| Header | DbId: edb DbLabel: Complementary Index An: 184749359 RelevancyScore: 1023 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 1023.07330322266 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: GBsim: A Robust GCN-BERT Approach for Cross-Architecture Binary Code Similarity Analysis. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Du%2C+Jiang%22">Du, Jiang</searchLink><br /><searchLink fieldCode="AR" term="%22Wei%2C+Qiang%22">Wei, Qiang</searchLink><br /><searchLink fieldCode="AR" term="%22Wang%2C+Yisen%22">Wang, Yisen</searchLink><br /><searchLink fieldCode="AR" term="%22Bai%2C+Xingyu%22">Bai, Xingyu</searchLink> – Name: TitleSource Label: Source Group: Src Data: Entropy; Apr2025, Vol. 27 Issue 4, p392, 23p – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22GRAPH+neural+networks%22">GRAPH neural networks</searchLink><br /><searchLink fieldCode="DE" term="%22NATURAL+language+processing%22">NATURAL language processing</searchLink><br /><searchLink fieldCode="DE" term="%22LANGUAGE+models%22">LANGUAGE models</searchLink><br /><searchLink fieldCode="DE" term="%22BINARY+codes%22">BINARY codes</searchLink><br /><searchLink fieldCode="DE" term="%22DEEP+learning%22">DEEP learning</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Recent advances in graph neural networks have transformed structural pattern learning in domains ranging from social network analysis to biomolecular modeling. Nevertheless, practical deployments in mission-critical scenarios such as binary code similarity detection face two fundamental obstacles: first, the inherent noise in graph construction processes exemplified by incomplete control flow edges during binary function recovery; second, the substantial distribution discrepancies caused by cross-architecture instruction set variations. Conventional GNN architectures demonstrate severe performance degradation under such low signal-to-noise ratio conditions and cross-domain operational environments, particularly in security-sensitive vulnerability identification tasks where feature instability or domain shifts could trigger critical false judgments. To address these challenges, we propose GBsim, a novel approach that combines graph neural networks with natural language processing. GBsim employs a cross-architecture language model to transform binary functions into semantic graphs, leverages a multilayer GCN for structural feature extraction, and employs a Transformer layer to integrate semantic information, generates robust cross-architecture embeddings that maintain high performance despite significant distribution shifts. Extensive experiments on a large-scale cross-architecture dataset show that GBsim achieves an MRR of 0.901 and a Recall@1 of 0.831, outperforming state-of-the-art methods. In real-world vulnerability detection tasks, GBsim achieves an average recall rate of 81.3% on a 1-day vulnerability dataset, demonstrating its practical effectiveness in identifying security threats and outperforming existing methods by 2.1%. This performance advantage stems from GBsim's ability to maximize information preservation across architectural boundaries, enhancing model robustness in the presence of noise and distribution shifts. [ABSTRACT FROM AUTHOR] – Name: Abstract Label: Group: Ab Data: <i>Copyright of Entropy is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=184749359 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.3390/e27040392 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 23 StartPage: 392 Subjects: – SubjectFull: GRAPH neural networks Type: general – SubjectFull: NATURAL language processing Type: general – SubjectFull: LANGUAGE models Type: general – SubjectFull: BINARY codes Type: general – SubjectFull: DEEP learning Type: general Titles: – TitleFull: GBsim: A Robust GCN-BERT Approach for Cross-Architecture Binary Code Similarity Analysis. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Du, Jiang – PersonEntity: Name: NameFull: Wei, Qiang – PersonEntity: Name: NameFull: Wang, Yisen – PersonEntity: Name: NameFull: Bai, Xingyu IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 04 Text: Apr2025 Type: published Y: 2025 Identifiers: – Type: issn-print Value: 10994300 Numbering: – Type: volume Value: 27 – Type: issue Value: 4 Titles: – TitleFull: Entropy Type: main |
| ResultId | 1 |
Full Text Finder
Nájsť tento článok vo Web of Science