FIN: Boosting binary code embedding by normalizing function inlinings.
Uloženo v:
| Název: | FIN: Boosting binary code embedding by normalizing function inlinings. |
|---|---|
| Autoři: | Amouei, Mohammadhossein1 (AUTHOR), Fung, Benjamin C.M.1 (AUTHOR) ben.fung@mcgill.ca, Charland, Philippe2 (AUTHOR) |
| Zdroj: | Journal of Systems & Software. Jan2026, Vol. 231, pN.PAG-N.PAG. 1p. |
| Témata: | MACHINE learning, FEATURE extraction, FLOWGRAPHS, COST functions, PROGRAM transformation |
| Abstrakt: | Binary code similarity detection (BCSD) is essential for identifying similar code sections across different programs, regardless of their source languages, compilation options, or underlying architectures. It plays a crucial role in areas such as code plagiarism detection, malware analysis, and vulnerability discovery. However, BCSD faces significant challenges due to compiler optimizations, such as function inlining, which alter the binary structure. Existing rule-based function control flow graph (CFG) expansion strategies have limited success, due to low precision and recall in identifying inlined call sites. In this study, we present a detailed investigation of function inlining and propose an AI-driven solution to expand CFGs, offering improvements for BCSD approaches. We designed a set of features for a machine learning algorithm to identify functions at O0 and O1 optimizations that may be inlined at the higher optimizations O2 and O3, without prior knowledge of the optimization level. By utilizing this information to expand function CFGs, we observed significant enhancements in the performance of state-of-the-art binary code representation learning techniques. Experimental results show that our proposed method increases the effectiveness of representation learning approaches by up to 21.54%. Additionally, our experiments show that our proposed method can improve true positive rate in identifying known vulnerabilities. [Display omitted] • Inlined functions impair binary similarity detection effectiveness. • Tool-assisted function inlining normalization boosts BCSD across optimization levels. • Average callee–caller distance is key for choosing optimal manual inlining candidates. • Tool-assisted function inlining normalization improves vulnerability detection. [ABSTRACT FROM AUTHOR] |
| Copyright of Journal of Systems & Software is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Databáze: | Business Source Index |
Buďte první, kdo okomentuje tento záznam!
Full Text Finder
Nájsť tento článok vo Web of Science