Are We There Yet? Filling the Gap Between Binary Similarity Analysis and Binary Software Composition Analysis

Saved in:
Bibliographic Details
Title: Are We There Yet? Filling the Gap Between Binary Similarity Analysis and Binary Software Composition Analysis
Authors: Wang, Huaijin, Liu, Zhibo, Wang, Shuai, Wang, Ying, Tang, Qiyi, Nie, Sen, Wu, Shi
Publisher Information: Institute of Electrical and Electronics Engineers Inc.
Publication Year: 2024
Collection: The Hong Kong University of Science and Technology: HKUST Institutional Repository
Subject Terms: Binary code analysis, Binary similarity analysis, Software composition analysis, Software reuse, Vulnerability detection
Description: Software composition analysis (SCA) has attracted the attention of the industry and academic community in recent years. Given a piece of program source code, SCA facilitates extracting certain components from the input program and matching the extracted components with opensource software (OSS) libraries. Despite the prosperous development of SCA, binary SCA (BSCA) is highly challenging and still underdeveloped. Few available BSCA solutions are either closed source (for commercial usage) or suffer from low performance. Nevertheless, a related line of research, binary similarity analysis (BSA), which decides the similarity between two pieces of binary code, has been progressively developed in academia for decades. De facto BSA techniques, often based on deep learning, efficiently analyze large-scale executables with high accuracy. This study explores bridging the gap between state-of-the-art (SOTA) BSA and BSCA. We spent considerable manual effort building the first large real-world benchmark dataset, containing over 55 million lines of C/C++ code. Then, we establish our BSCA pipeline by extending and calibrating the SOTA SCA pipeline. Particularly, we concretize the key procedure of BSCA, namely matching a binary component with OSS using six SOTA BSA techniques. Evaluation using our benchmark dataset reveals that simply employing BSA in BSCA exhibits less desirable accuracy, as BSCA faces unique challenges. After inspecting the failed cases, we propose three enhancements whose hybrid usage improves the F1 score of BSCA by over 30% and outperforms SOTA commercial BSCA software. Our experiment on 1-day vulnerability detection demonstrates our BSCA framework's effectiveness. We also discuss several open challenges and potential solutions to augment BSCA solutions.
Document Type: conference object
Language: English
Relation: https://doi.org/10.1109/EuroSP60621.2024.00034; http://gateway.isiknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=LinksAMR&SrcApp=PARTNER_APP&DestLinkType=FullRecord&DestApp=WOS&KeyUT=001304430300026
DOI: 10.1109/EuroSP60621.2024.00034
Availability: http://repository.hkust.edu.hk/ir/Record/1783.1-147321
https://doi.org/10.1109/EuroSP60621.2024.00034
http://lbdiscover.ust.hk/uresolver?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rfr_id=info:sid/HKUST:SPI&rft.genre=article&rft.issn=&rft.volume=&rft.issue=&rft.date=2024&rft.spage=&rft.aulast=Wang&rft.aufirst=Huaijin&rft.atitle=Are+We+There+Yet%3F+Filling+the+Gap+Between+ML-Based+Binary+Similarity+Analysis+and+Binary+Software+Composition+Analysis&rft.title=
http://www.scopus.com/record/display.url?eid=2-s2.0-85203675956&origin=inward
http://gateway.isiknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=LinksAMR&SrcApp=PARTNER_APP&DestLinkType=FullRecord&DestApp=WOS&KeyUT=001304430300026
Accession Number: edsbas.A0305367
Database: BASE
Be the first to leave a comment!
You must be logged in first