Automating fuzz driver generation for deep learning libraries with large language models.

Saved in:
Bibliographic Details
Title: Automating fuzz driver generation for deep learning libraries with large language models.
Authors: Zheng, Tianming, Meng, Fanchao, Yi, Ping, Wu, Yue
Source: Cybersecurity (2523-3246); 1/4/2026, Vol. 9 Issue 1, p1-21, 21p
Subject Terms: LANGUAGE models, COMPUTER software testing, TEST systems, STATISTICAL reliability, MECHANIZATION
Abstract: The widespread adoption of deep learning (DL) libraries has raised concerns about their reliability and security. While prior works leveraged large language models (LLMs) to generate test programs for DL library APIs, the hardcoded program behaviors and low code validity rates render them impractical for real-world testing. To address these challenges, we propose FD-FACTORY, a fully automated framework that leverages LLMs to generate fuzz drivers for DL API testing. The fuzz driver programs accept mutated inputs from fuzzing engines to achieve effective code analysis. Inspired by the modular design of industrial production lines, FD-FACTORY decomposes the generation process into eight distinct stages: Preparation, Initial Fuzz Driver Generation, Early Stop Checks, Verification, Issue Diagnosis, Decision Making, Repair Loop, and Deployment. Each stage is handled by dedicated agents or tools to enhance construction efficiency. Experimental results demonstrate that FD-FACTORY achieves 73.67% and 65.33% success rates in generating fuzz drivers for PyTorch and TensorFlow, producing an improvement of 34.66 to - 54.66% than existing approaches. In addition, FD-FACTORY provides more comprehensive coverage tracking by supporting both Python and native C/C++ code. It achieves a total coverage of 308,351 lines on PyTorch and 528,427 lines on TensorFlow, substantially surpassing the results reported by previous approaches. Unlike prior approaches relying on repeated interactions with the LLM servers throughout the entire testing process, our framework confines the use of LLMs strictly to the fuzz driver generation stages before deployment. Once generated, the fuzz drivers can be reused without further LLM involvement, thereby enhancing the practicality and sustainability of LLM-assisted fuzzing in real-world scenarios. [ABSTRACT FROM AUTHOR]
Copyright of Cybersecurity (2523-3246) is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Complementary Index
Description
Abstract:The widespread adoption of deep learning (DL) libraries has raised concerns about their reliability and security. While prior works leveraged large language models (LLMs) to generate test programs for DL library APIs, the hardcoded program behaviors and low code validity rates render them impractical for real-world testing. To address these challenges, we propose FD-FACTORY, a fully automated framework that leverages LLMs to generate fuzz drivers for DL API testing. The fuzz driver programs accept mutated inputs from fuzzing engines to achieve effective code analysis. Inspired by the modular design of industrial production lines, FD-FACTORY decomposes the generation process into eight distinct stages: Preparation, Initial Fuzz Driver Generation, Early Stop Checks, Verification, Issue Diagnosis, Decision Making, Repair Loop, and Deployment. Each stage is handled by dedicated agents or tools to enhance construction efficiency. Experimental results demonstrate that FD-FACTORY achieves 73.67% and 65.33% success rates in generating fuzz drivers for PyTorch and TensorFlow, producing an improvement of 34.66 to - 54.66% than existing approaches. In addition, FD-FACTORY provides more comprehensive coverage tracking by supporting both Python and native C/C++ code. It achieves a total coverage of 308,351 lines on PyTorch and 528,427 lines on TensorFlow, substantially surpassing the results reported by previous approaches. Unlike prior approaches relying on repeated interactions with the LLM servers throughout the entire testing process, our framework confines the use of LLMs strictly to the fuzz driver generation stages before deployment. Once generated, the fuzz drivers can be reused without further LLM involvement, thereby enhancing the practicality and sustainability of LLM-assisted fuzzing in real-world scenarios. [ABSTRACT FROM AUTHOR]
ISSN:25233246
DOI:10.1186/s42400-025-00532-9