Compiler-provenance identification in obfuscated binaries using vision transformers

Uložené v:
Podrobná bibliografia
Názov: Compiler-provenance identification in obfuscated binaries using vision transformers
Autori: Khan, Wasif, Alrabaee, Saed, Al-kfairy, Mousa, Tang, Jie, Raymond Choo, Kim Kwang
Zdroj: All Works
Informácie o vydavateľovi: ZU Scholars
Rok vydania: 2024
Predmety: Binary code analysis, Compiler provenance, Malware analysis, Reverse engineering, Computer Sciences
Popis: Extracting compiler-provenance-related information (e.g., the source of a compiler, its version, its optimization settings, and compiler-related functions) is crucial for binary-analysis tasks such as function fingerprinting, detecting code clones, and determining authorship attribution. However, the presence of obfuscation techniques has complicated the efforts to automate such extraction. In this paper, we propose an efficient and resilient approach to provenance identification in obfuscated binaries using advanced pre-trained computer-vision models. To achieve this, we transform the program binaries into images and apply a two-layer approach for compiler and optimization prediction. Extensive results from experiments performed on a large-scale dataset show that the proposed method can achieve an accuracy of over 98 % for both obfuscated and deobfuscated binaries.
Druh dokumentu: text
Popis súboru: application/pdf
Jazyk: unknown
Relation: https://zuscholars.zu.ac.ae/works/6635; https://zuscholars.zu.ac.ae/context/works/article/7672/viewcontent/1_s2.0_S2666281724000830_main.pdf
DOI: 10.1016/j.fsidi.2024.301764
Dostupnosť: https://zuscholars.zu.ac.ae/works/6635
https://doi.org/10.1016/j.fsidi.2024.301764
https://zuscholars.zu.ac.ae/context/works/article/7672/viewcontent/1_s2.0_S2666281724000830_main.pdf
Rights: http://creativecommons.org/licenses/by-nc-nd/4.0/
Prístupové číslo: edsbas.E4055BE0
Databáza: BASE
Popis
Abstrakt:Extracting compiler-provenance-related information (e.g., the source of a compiler, its version, its optimization settings, and compiler-related functions) is crucial for binary-analysis tasks such as function fingerprinting, detecting code clones, and determining authorship attribution. However, the presence of obfuscation techniques has complicated the efforts to automate such extraction. In this paper, we propose an efficient and resilient approach to provenance identification in obfuscated binaries using advanced pre-trained computer-vision models. To achieve this, we transform the program binaries into images and apply a two-layer approach for compiler and optimization prediction. Extensive results from experiments performed on a large-scale dataset show that the proposed method can achieve an accuracy of over 98 % for both obfuscated and deobfuscated binaries.
DOI:10.1016/j.fsidi.2024.301764