SBOM generation based on code-level external component trees.
Uloženo v:
| Název: | SBOM generation based on code-level external component trees. |
|---|---|
| Autoři: | Kong X; Purple Mountain Laboratories, Nanjing, 211111, China. xlkong@seu.edu.cn., Zhuo H; Purple Mountain Laboratories, Nanjing, 211111, China.; School of Computer Science and Engineering, Southeast University, Nanjing, 211189, China., Miao X; Purple Mountain Laboratories, Nanjing, 211111, China., Huang W; Purple Mountain Laboratories, Nanjing, 211111, China., Du J; Purple Mountain Laboratories, Nanjing, 211111, China. |
| Zdroj: | Scientific reports [Sci Rep] 2025 Nov 24; Vol. 15 (1), pp. 45277. Date of Electronic Publication: 2025 Nov 24. |
| Způsob vydávání: | Journal Article |
| Jazyk: | English |
| Informace o časopise: | Publisher: Nature Publishing Group Country of Publication: England NLM ID: 101563288 Publication Model: Electronic Cited Medium: Internet ISSN: 2045-2322 (Electronic) Linking ISSN: 20452322 NLM ISO Abbreviation: Sci Rep Subsets: MEDLINE; PubMed not MEDLINE |
| Imprint Name(s): | Original Publication: London : Nature Publishing Group, copyright 2011- |
| Abstrakt: | A huge body of research and engineering efforts have been dedicated to extract and utilize software bill of materials (SBOM) due to the requirement of supply chain security. Existing approaches primarily rely on property files from dependency management tools, e.g., pom.xml from Maven, to generate SBOMs. However, the effectiveness of SBOM is directly affected by the bloated dependency and missing dependency problems during dependency management. In this work, we propose a source code-based approach to SBOM generation that only focuses on identifying the actually used external components. To support this, we introduce a novel structure called the external component tree (ECT), which organizes code-level dependency declarations at scale. We design three filters to eliminate programming language native, project-specific, and unused external components, and further applies subtree trimming algorithms to extract representative components from complex dependency hierarchies. Our approach is evaluated on 30 open-source projects in Java, Python, and Scala languages, comparing with CycloneDX-Generator, OpenRewrite, Build-Info-Go, and Microsoft SBOM-Tool. The results show that all SBOMs generated by our approach are correct, successfully addressing the bloated dependencies in the experiments. Furthermore, our method achieves a recall rate of 99.8%, the highest among all evaluated tools, indicating minimal component omission of actual software dependencies. (© 2025. The Author(s).) |
| Competing Interests: | Declarations. Competing interests: The authors declare no competing interests. |
| References: | Xiao, Y., Kirat, D., Schales, D. L., Jang, J., Xing, L., & Liao, X.. Jbomaudit: Assessing the landscape, compliance, and security implications of java sboms. In Proceedings of the 32nd Annual Network and Distributed System Security Symposium, pages 1–20. https://doi.org/10.14722/ndss.2025.240322 (2025). Anjum, N., Sakib, N., Rodriguez-Cardenas, J., Brookins, C., Norouzinia, A., Shavers, A., & Shahriar, H.. Uncovering software supply chains vulnerability: A review of attack vectors, stakeholders, and regulatory frameworks. In Proceedings of the 47th IEEE Annual Computers, Software, and Applications Conference, pages 1816–1821. https://doi.org/10.1109/COMPSAC57700.2023.00281 (2023). Peisert, S. et al. Perspectives on the solarwinds incident. IEEE Secur. Privacy 19(2), 7–13. https://doi.org/10.1109/MSEC.2021.3051235 (2021). (PMID: 10.1109/MSEC.2021.3051235) Hiesgen, R., Nawrocki, M., Schmidt, T. C. & Wählisch, M. The log4j incident: A comprehensive measurement study of a critical vulnerability. IEEE Trans. Netw. Serv. Manage. 21(6), 5921–5934. https://doi.org/10.1109/TNSM.2024.3440188 (2024). (PMID: 10.1109/TNSM.2024.3440188) Imtiaz, N., Thorn, S., & Williams, L. A comparative study of vulnerability reporting by software composition analysis tools. In Proceedings of the 15th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pages 1–11, (2021). https://doi.org/10.1145/3475716.3475769 (2021). Xia, B., Bi, T., Xing, Z., Lu, Q., & Zhu, L., Liming. An empirical study on software bill of materials: Where we stand and the road ahead. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering, pages 2630–2642 . https://doi.org/10.1109/ICSE48619.2023.00219 (2023). Zahan, N., Lin, E., Tamanna, M., Enck, W., & Williams, L. Software bills of materials are required. are we there yet? IEEE Security & Privacy, 21(2):82–88, . https://doi.org/10.1109/MSEC.2023.3237100 (2023) . Hendrick, S. The state of software bill of materials (sbom) and cybersecurity readiness. https://www.linuxfoundation.org/ . https://doi.org/10.70828/RYTL5793 . Soeiro, L., Robert, T., & Zacchiroli, S. Wild sboms: a large-scale dataset of software bills of materials from public code. In Proceedings of the 22nd IEEE/ACM International Conference on Mining Software Repositories, pages 164–168. IEEE. https://doi.org/10.1109/MSR66628.2025.00036 (2025). Balliu, M. et al. Challenges of producing software bill of materials for java. IEEE Secur. Privacy 21(6), 12–23. https://doi.org/10.1109/MSEC.2023.3302956 (2023). (PMID: 10.1109/MSEC.2023.3302956) Keshani, M., Vos, S. & Proksch, S. On the relation of method popularity to breaking changes in the maven ecosystem. J. Syst. Softw. 203, 111738. https://doi.org/10.1016/J.JSS.2023.111738 (2023). (PMID: 10.1016/J.JSS.2023.111738) Rabbi, M. F., Champa, A. I., & Zibran, M. F. Claim vs. capability: A comparative analysis of the SBOM generation tools for rust projects. In Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing, pages 1712–1720. ACM. https://doi.org/10.1145/3672608.3707940 (2025). Soto-Valero, C., Harrand, N., Monperrus, M. & Baudry, B. A comprehensive study of bloated dependencies in the maven ecosystem. Empir. Softw. Eng. 26(3), 45. https://doi.org/10.1007/S10664-020-09914-8 (2021). (PMID: 10.1007/S10664-020-09914-8) Cao, Y. et al. Towards better dependency management: A first look at dependency smells in python projects. IEEE Trans. Software Eng. 49(4), 1741–1765. https://doi.org/10.1109/TSE.2022.3191353 (2023). (PMID: 10.1109/TSE.2022.3191353) Xie, R., Kong, X., Wang, L., Zhou, Y., & Li, B. Hirec: API recommendation using hierarchical context. In Proceedings of the 30th IEEE International Symposium on Software Reliability Engineering, pages 369–379. https://doi.org/10.1109/ISSRE.2019.00044 (2019). Wang, L., Li, B. & Kong, X. Type slicing: An accurate object oriented slicing based on sub-statement level dependence graph. Inf. Softw. Technol. 127, 106369. https://doi.org/10.1016/J.INFSOF.2020.106369 (2020). (PMID: 10.1016/J.INFSOF.2020.106369) Chaora, Anesu, Ensmenger, Nathan L., & Camp, L. Jean. Discourse, challenges, and prospects around the adoption and dissemination of software bills of materials (sboms). In Proceedings of the 2023 IEEE International Symposium on Technology and Society, pages 1–4 . https://doi.org/10.1109/ISTAS57930.2023.10305922 (2023). CycloneDX. Cyclonedx generator documentation. https://cyclonedx.github.io/cdxgen . [Online, accessed on 20-November-2023]. OpenRewrite. Maven plugin configuration. https://docs.openrewrite.org/reference/rewrite-maven-plugin . [Online, accessed on 20-November-2023]. JFrog. Build info. https://www.buildinfo.org/ . [Online, accessed on 20-November-2023]. Microsoft. Introduction of sbom-tool. https://github.com/microsoft/sbom-tool . [Online, accessed on 20-November-2023]. Torres-Arias, S., Geer, D. E. & Meyers, J. S. A viewpoint on knowing software: Bill of materials quality when you see it. IEEE Security & Privacy 21(6), 50–54. https://doi.org/10.1109/MSEC.2023.3315887 (2023). (PMID: 10.1109/MSEC.2023.3315887) Ikegami, Ayano, Kula, Raula Gaikovina, Chinthanet, Bodin, Maeprasart, Vittunyuta, Ouni, Ali, Ishio, Takashi, & Matsumoto, Kenichi. On the use of refactoring in security vulnerability fixes: An exploratory study on maven libraries. In Proceedings of the 2022 International Conference on Evaluation and Assessment in Software Engineering, pages 288–293. ACM . https://doi.org/10.1145/3530019.3535304 (2022). Eclipse. Java development tools (jdt). https://eclipse.dev/jdt/ . [Online, accessed on 20-November-2023]. Python Software Foundation. Python library documents. https://docs.python.org/3/library/ast.html . [Online, accessed on 28-February-2024]. Scala. scala-parser-combinators. https://github.com/scala/scala-parser-combinators . [Online, accessed on 28-February-2024]. Kong, X. et al. Enhancing configuration security with heterogeneous read points. J. Cloud Comput. 14(1), 16. https://doi.org/10.1186/S13677-025-00740-1 (2025). (PMID: 10.1186/S13677-025-00740-1) Nie, Pengyu, Çelik, Ahmet, Coley, Matthew, Milicevic, Aleksandar, Bell, Jonathan, & Gligoric, Milos. Debugging the performance of maven’s test isolation: experience report. In Proceedings of the 29th International Symposium on Software Testing and Analysis, pages 249–259. ACM . https://doi.org/10.1145/3395363.3397381 (2020). Oracle. Naming a package. https://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html . [Online, accessed on 20-November-2023]. Górski, T. The 1+5 architectural views model in designing blockchain and it system integration solutions. Symmetry 13(11), 2000. https://doi.org/10.3390/sym13112000 (2021). (PMID: 10.3390/sym13112000) Kong, X., Kong, S., Ming, Yu. & Chengjie, D. Joint Embedding of Semantic and Statistical Features for Effective Code Search. Appl. Sci. 12(19), 10002. https://doi.org/10.3390/app121910002 (2022). (PMID: 10.3390/app121910002) Wang, L. et al. Microservice architecture recovery based on intra-service and inter-service features. J. Syst. Softw. 204, 111754. https://doi.org/10.1016/j.jss.2023.111754 (2023). (PMID: 10.1016/j.jss.2023.111754) Cheng, Xinyun, Kong, Xianglong, Liao, Li, & Li, Bixin. A combined method for usage of NLP libraries towards analyzing software documents. In Proceedings of the 32nd International Conference on Advanced Information Systems Engineering, volume 12127, pages 515–529. Springer. https://doi.org/10.1007/978-3-030-49435-3_32 (2020). Kishimoto, Rio , Kanda, Tetsuya, Manabe, Yuki, Inoue, Katsuro, Qiu, Shi, & Higo, Yoshiki. A dataset of software bill of materials for evaluating SBOM consumption tools. In Proceedings of the 22nd IEEE/ACM International Conference on Mining Software Repositories, MSR@ICSE 2025, Ottawa, ON, Canada, April 28-29, 2025, pages 576–580. IEEE. https://doi.org/10.1109/MSR66628.2025.00090 (2025). NTIA. Software component transparency: Establishing a common software bill of materials (sbom). https://ntia.gov/sites/default/files/publications/ntia_sbom_framing_2nd_edition_20211021_0.pdf . [Online, accessed on 28-March-2023]. Ladisa, Piergiorgio, Plate, Henrik, Martinez, Matias, & Barais, Olivier. Sok: Taxonomy of attacks on open-source software supply chains. In Proceedings of the 2023 IEEE Symposium on Security and Privacy, pages 1509–1526. https://doi.org/10.1109/SP46215.2023.10179304 (2023). Gandhi, Robin A., Germonprez, Matt, & Link, Georg J. P. Open data standards for open source software risk management routines: An examination of SPDX. In Proceedings of the 2018 ACM Conference on Supporting Groupwork,, pages 219–229. ACM. https://doi.org/10.1145/3148330.3148333 (2018). Pashchenko, Ivan, Plate, Henrik, Ponta, Serena Elisa, Sabetta, Antonino, & Massacci, Fabio. Vulnerable open source dependencies: Counting those that matter. In Proceedings of the 12th ACM/IEEE international symposium on empirical software engineering and measurement, pages 1–10. https://doi.org/10.1145/3239235.3268920 (2018). Kula, R. G., German, D. M., Ouni, A., Ishio, T. & Inoue, K. Do developers update their library dependencies? an empirical study on the impact of security advisories on library migration. Empir. Softw. Eng. 23, 384–417. https://doi.org/10.1007/S10664-017-9521-5 (2018). (PMID: 10.1007/S10664-017-9521-5) Chinthanet, Bodin, Ponta, Serena Elisa, Plate, Henrik, Sabetta, Antonino, Kula, Raula Gaikovina, Ishio, Takashi & Matsumoto, Kenichi. Code-based vulnerability detection in node. js applications: How far are we? In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pages 1199–1203. https://doi.org/10.1145/3324884.3421838 (2020). Prana, G. A. A. et al. Out of sight, out of mind? how vulnerable dependencies affect open-source projects. Empir. Softw. Eng. 26(4), 59. https://doi.org/10.1007/S10664-021-09959-3 (2021). (PMID: 10.1007/S10664-021-09959-3) Pereira, Devin, Molloy, Christopher, Acharya, Sudipta, & Ding, Steven H. H. Automating SBOM generation with zero-shot semantic similarity. CoRR, abs/2403.08799. https://doi.org/10.48550/ARXIV.2403.08799 (2024). Lemay, Antoine, & Katiyar, Neeraj. Supply chain risk analysis via SBOM data enrichment. In Proceedings of the 2025 IEEE International systems Conference, pages 1–8. IEEE, . https://doi.org/10.1109/SYSCON64521.2025.11014830 (2025). Pashchenko, Ivan , Vu, Duc-Ly, & Massacci, Fabio. A qualitative study of dependency management and its security implications. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 1513–1531. ACM. https://doi.org/10.1145/3372297.3417232 (2020). Benedetti, Giacomo, Cofano, Serena, Brighente, Alessandro, & Conti, Mauro. The impact of SBOM generators on vulnerability assessment in python: A comparison and a novel approach. In Applied Cryptography and Network Security - 23rd International Conference, volume 15826 of Lecture Notes in Computer Science, pages 487–509. Springer . https://doi.org/10.1007/978-3-031-95764-2_19 (2025). Kloeg, Berend, Ding, Aaron Yi, Pellegrom, Sjoerd, & Zhauniarovich, Yury. Charting the path to SBOM adoption: A business stakeholder-centric approach. In Proceedings of the 19th ACM Asia Conference on Computer and Communications Security. ACM. https://doi.org/10.1145/3634737.3637659 (2024). Stalnaker, Trevor, Wintersgill, Nathan, Chaparro, Oscar, Penta, Massimiliano Di, Germán, Daniel M., & Poshyvanyk, Denys. Boms away! inside the minds of stakeholders: A comprehensive study of bills of materials for software systems. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, pages 44:1–44:13 . https://doi.org/10.1145/3597503.3623347 (2024). Mirakhorli, Mehdi, Garcia, Derek, Dillon, Schuyler, Laporte, Kevin, Morrison, Matthew, Lu, Henry, Koscinski, Viktoria, & Enoch, Christopher. A landscape study of open source and proprietary tools for software bill of materials (SBOM). CoRR, abs/2402.11151. https://doi.org/10.48550/ARXIV.2402.11151 (2024). Rabbi, Md. Fazle, Champa, Arifa Islam, Nachuma, Costain, & Zibran, Minhaz Fahim. SBOM generation tools under microscope: A focus on the npm ecosystem. In Jiman Hong and Juw Won Park, editors, Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, pages 1233–1241. https://doi.org/10.1145/3605098.3635927 (2024). Nocera, Sabato, Romano, Simone, Penta, Massimiliano Di, Francese, Rita, & Scanniello, Giuseppe . Software bill of materials adoption: A mining study from github. In Proceedings of the 2023 IEEE International Conference on Software Maintenance and Evolution, pages 39–49. https://doi.org/10.1109/ICSME58846.2023.00016 (2023). |
| Grant Information: | ZL042401 Technology Project of Purple Mountain Laboratories (Research on Fundamental Theories and Toolchain for Endogenous Safety and Security) |
| Entry Date(s): | Date Created: 20251123 Latest Revision: 20260101 |
| Update Code: | 20260130 |
| PubMed Central ID: | PMC12748611 |
| DOI: | 10.1038/s41598-025-29762-0 |
| PMID: | 41276602 |
| Databáze: | MEDLINE |
| Abstrakt: | A huge body of research and engineering efforts have been dedicated to extract and utilize software bill of materials (SBOM) due to the requirement of supply chain security. Existing approaches primarily rely on property files from dependency management tools, e.g., pom.xml from Maven, to generate SBOMs. However, the effectiveness of SBOM is directly affected by the bloated dependency and missing dependency problems during dependency management. In this work, we propose a source code-based approach to SBOM generation that only focuses on identifying the actually used external components. To support this, we introduce a novel structure called the external component tree (ECT), which organizes code-level dependency declarations at scale. We design three filters to eliminate programming language native, project-specific, and unused external components, and further applies subtree trimming algorithms to extract representative components from complex dependency hierarchies. Our approach is evaluated on 30 open-source projects in Java, Python, and Scala languages, comparing with CycloneDX-Generator, OpenRewrite, Build-Info-Go, and Microsoft SBOM-Tool. The results show that all SBOMs generated by our approach are correct, successfully addressing the bloated dependencies in the experiments. Furthermore, our method achieves a recall rate of 99.8%, the highest among all evaluated tools, indicating minimal component omission of actual software dependencies.<br /> (© 2025. The Author(s).) |
|---|---|
| ISSN: | 2045-2322 |
| DOI: | 10.1038/s41598-025-29762-0 |
Full Text Finder
Nájsť tento článok vo Web of Science