GBsim: A Robust GCN-BERT Approach for Cross-Architecture Binary Code Similarity Analysis.

Uloženo v:
Podrobná bibliografie
Název: GBsim: A Robust GCN-BERT Approach for Cross-Architecture Binary Code Similarity Analysis.
Autoři: Du J; School of Cyber Science and Engineering, Information Engineering University, Zhengzhou 450001, China., Wei Q; School of Cyber Science and Engineering, Information Engineering University, Zhengzhou 450001, China., Wang Y; School of Cyber Science and Engineering, Information Engineering University, Zhengzhou 450001, China., Bai X; School of Cyber Science and Engineering, Information Engineering University, Zhengzhou 450001, China.
Zdroj: Entropy (Basel, Switzerland) [Entropy (Basel)] 2025 Apr 07; Vol. 27 (4). Date of Electronic Publication: 2025 Apr 07.
Způsob vydávání: Journal Article
Jazyk: English
Informace o časopise: Publisher: MDPI Country of Publication: Switzerland NLM ID: 101243874 Publication Model: Electronic Cited Medium: Internet ISSN: 1099-4300 (Electronic) Linking ISSN: 10994300 NLM ISO Abbreviation: Entropy (Basel) Subsets: PubMed not MEDLINE
Imprint Name(s): Original Publication: Basel, Switzerland : MDPI, 1999-
Abstrakt: Recent advances in graph neural networks have transformed structural pattern learning in domains ranging from social network analysis to biomolecular modeling. Nevertheless, practical deployments in mission-critical scenarios such as binary code similarity detection face two fundamental obstacles: first, the inherent noise in graph construction processes exemplified by incomplete control flow edges during binary function recovery; second, the substantial distribution discrepancies caused by cross-architecture instruction set variations. Conventional GNN architectures demonstrate severe performance degradation under such low signal-to-noise ratio conditions and cross-domain operational environments, particularly in security-sensitive vulnerability identification tasks where feature instability or domain shifts could trigger critical false judgments. To address these challenges, we propose GBsim, a novel approach that combines graph neural networks with natural language processing. GBsim employs a cross-architecture language model to transform binary functions into semantic graphs, leverages a multilayer GCN for structural feature extraction, and employs a Transformer layer to integrate semantic information, generates robust cross-architecture embeddings that maintain high performance despite significant distribution shifts. Extensive experiments on a large-scale cross-architecture dataset show that GBsim achieves an MRR of 0.901 and a Recall@1 of 0.831, outperforming state-of-the-art methods. In real-world vulnerability detection tasks, GBsim achieves an average recall rate of 81.3% on a 1-day vulnerability dataset, demonstrating its practical effectiveness in identifying security threats and outperforming existing methods by 2.1%. This performance advantage stems from GBsim's ability to maximize information preservation across architectural boundaries, enhancing model robustness in the presence of noise and distribution shifts.
Contributed Indexing: Keywords: binary code similarity analysis; cross-architecture embedding; graph neural network robustness; hybrid deep learning
Entry Date(s): Date Created: 20250426 Latest Revision: 20250429
Update Code: 20250429
PubMed Central ID: PMC12025366
DOI: 10.3390/e27040392
PMID: 40282627
Databáze: MEDLINE
Buďte první, kdo okomentuje tento záznam!
Nejprve se musíte přihlásit.