GBsim: A Robust GCN-BERT Approach for Cross-Architecture Binary Code Similarity Analysis.
Uloženo v:
| Název: | GBsim: A Robust GCN-BERT Approach for Cross-Architecture Binary Code Similarity Analysis. |
|---|---|
| Autoři: | Du J; School of Cyber Science and Engineering, Information Engineering University, Zhengzhou 450001, China., Wei Q; School of Cyber Science and Engineering, Information Engineering University, Zhengzhou 450001, China., Wang Y; School of Cyber Science and Engineering, Information Engineering University, Zhengzhou 450001, China., Bai X; School of Cyber Science and Engineering, Information Engineering University, Zhengzhou 450001, China. |
| Zdroj: | Entropy (Basel, Switzerland) [Entropy (Basel)] 2025 Apr 07; Vol. 27 (4). Date of Electronic Publication: 2025 Apr 07. |
| Způsob vydávání: | Journal Article |
| Jazyk: | English |
| Informace o časopise: | Publisher: MDPI Country of Publication: Switzerland NLM ID: 101243874 Publication Model: Electronic Cited Medium: Internet ISSN: 1099-4300 (Electronic) Linking ISSN: 10994300 NLM ISO Abbreviation: Entropy (Basel) Subsets: PubMed not MEDLINE |
| Imprint Name(s): | Original Publication: Basel, Switzerland : MDPI, 1999- |
| Abstrakt: | Recent advances in graph neural networks have transformed structural pattern learning in domains ranging from social network analysis to biomolecular modeling. Nevertheless, practical deployments in mission-critical scenarios such as binary code similarity detection face two fundamental obstacles: first, the inherent noise in graph construction processes exemplified by incomplete control flow edges during binary function recovery; second, the substantial distribution discrepancies caused by cross-architecture instruction set variations. Conventional GNN architectures demonstrate severe performance degradation under such low signal-to-noise ratio conditions and cross-domain operational environments, particularly in security-sensitive vulnerability identification tasks where feature instability or domain shifts could trigger critical false judgments. To address these challenges, we propose GBsim, a novel approach that combines graph neural networks with natural language processing. GBsim employs a cross-architecture language model to transform binary functions into semantic graphs, leverages a multilayer GCN for structural feature extraction, and employs a Transformer layer to integrate semantic information, generates robust cross-architecture embeddings that maintain high performance despite significant distribution shifts. Extensive experiments on a large-scale cross-architecture dataset show that GBsim achieves an MRR of 0.901 and a Recall@1 of 0.831, outperforming state-of-the-art methods. In real-world vulnerability detection tasks, GBsim achieves an average recall rate of 81.3% on a 1-day vulnerability dataset, demonstrating its practical effectiveness in identifying security threats and outperforming existing methods by 2.1%. This performance advantage stems from GBsim's ability to maximize information preservation across architectural boundaries, enhancing model robustness in the presence of noise and distribution shifts. |
| Contributed Indexing: | Keywords: binary code similarity analysis; cross-architecture embedding; graph neural network robustness; hybrid deep learning |
| Entry Date(s): | Date Created: 20250426 Latest Revision: 20250429 |
| Update Code: | 20250429 |
| PubMed Central ID: | PMC12025366 |
| DOI: | 10.3390/e27040392 |
| PMID: | 40282627 |
| Databáze: | MEDLINE |
Buďte první, kdo okomentuje tento záznam!
Full Text Finder
Nájsť tento článok vo Web of Science