A cross-language and cross-binary type approach to binary-source software composition analysis using BM25.
Gespeichert in:
| Titel: | A cross-language and cross-binary type approach to binary-source software composition analysis using BM25. |
|---|---|
| Autoren: | Kim, Jong-Wouk1 (AUTHOR) jw.kim@kangwon.ac.kr, Choi, Mi-Jung2,3,4 (AUTHOR) mjchoi@kiangwon.ac.kr |
| Quelle: | International Journal of Information Security. Dec2025, Vol. 24 Issue 6, p1-17. 17p. |
| Abstract: | Software composition analysis (SCA) involves analyzing open-source software (OSS) components used in software development to identify license compliance issues and security vulnerabilities. It helps developers mitigate the legal and security risks associated with OSS, ensuring safer and more reliable software. Traditional SCA methods often require users to upload their source code to an SCA server for inspection, which then generates reports on component usage. However, vendors have significant concerns about this approach because source code often includes sensitive information like proprietary algorithms, core business logic, and confidential user data. To address these challenges, various SCA techniques, such as binary-binary SCA and binary-source SCA, have been proposed, though most are limited to specific programming languages. Existing SCA frameworks primarily focus on C/C + + and Java. This paper introduces the first binary-source SCA method capable of analyzing three binary types across five programming languages (C/C + + , Objective-C, Swift, Go). Our approach utilizes BM25-based text tokens, significantly reducing computational cost while maintaining detection performance. Empirical results demonstrate that our method achieves up to 81.4% recall in identifying reused OSS components, providing an efficient and scalable solution for secure software development. Additionally, this study highlights the limitations of the proposed method and suggests future research directions to address these challenges. [ABSTRACT FROM AUTHOR] |
| Datenbank: | Academic Search Index |
| Abstract: | Software composition analysis (SCA) involves analyzing open-source software (OSS) components used in software development to identify license compliance issues and security vulnerabilities. It helps developers mitigate the legal and security risks associated with OSS, ensuring safer and more reliable software. Traditional SCA methods often require users to upload their source code to an SCA server for inspection, which then generates reports on component usage. However, vendors have significant concerns about this approach because source code often includes sensitive information like proprietary algorithms, core business logic, and confidential user data. To address these challenges, various SCA techniques, such as binary-binary SCA and binary-source SCA, have been proposed, though most are limited to specific programming languages. Existing SCA frameworks primarily focus on C/C + + and Java. This paper introduces the first binary-source SCA method capable of analyzing three binary types across five programming languages (C/C + + , Objective-C, Swift, Go). Our approach utilizes BM25-based text tokens, significantly reducing computational cost while maintaining detection performance. Empirical results demonstrate that our method achieves up to 81.4% recall in identifying reused OSS components, providing an efficient and scalable solution for secure software development. Additionally, this study highlights the limitations of the proposed method and suggests future research directions to address these challenges. [ABSTRACT FROM AUTHOR] |
|---|---|
| ISSN: | 16155262 |
| DOI: | 10.1007/s10207-025-01148-3 |
Full Text Finder
Nájsť tento článok vo Web of Science