What question did this study set out to answer?

This research aims to assess advancements in automated fuzz driver generation and establish a systematic evaluation framework.

April 23, 2026Open Access

Fuzz Driver Generation: A Survey and Outlook from the Perspective of Data Sources

Key Points

This research aims to assess advancements in automated fuzz driver generation and establish a systematic evaluation framework.
Developed a taxonomy categorizing fuzz driver generation approaches into four data source trajectories.
Introduced a comparability-oriented evaluation perspective focused on validity, reachability, and reproducibility.
Analyzed the integration of large language models within fuzz driver generation workflows.
Evolved a structured understanding of the constraints and methodologies for fuzz driver generation.
Identified limitations in current fuzzing practices and the need for better context handling.
Outlined three future research directions to enhance fuzz driver adaptability and cross-ecosystem usability.

Abstract

Fuzzing is an essential element of software supply chain security governance. Despite its importance, the widespread adoption of library fuzzing is limited by the significant costs associated with constructing fuzz drivers. Without a clear entry point, the reachable path space of the target library is determined by the interplay of API call sequences, parameter dependencies, and state constraints. As a result, fuzz drivers must achieve not only successful builds but also provide sufficient semantic context to enable exploration of deeper state machine interactions, thereby avoiding premature stagnation at superficial validation logic. To systematically assess advancements in automated fuzz driver generation, this paper develops a taxonomy organized around the primary data sources used to derive driver-generation constraints, categorizing existing approaches into four technological trajectories: Usage Artifact Mining, Source Code Constraint Inference, Binary Semantics Recovery, and Heterogeneous Data Fusion. Large language models are increasingly integrated into these workflows as generators and as components for constraint alignment and repair. To address inconsistencies in experimental methodologies, this paper introduces a bounded comparability-oriented evaluation perspective focused on three dimensions: validity, reachability-related evidence, and reproducibility and cost. Together with a disclosure and reporting protocol for metric comparability, this perspective clarifies the information needed for cross-study comparison and examines the unique features and inherent limitations of each technical trajectory. Based on these findings, three key directions for future research are identified: facilitating structural evolution in response to coverage plateaus to address deep logic unreachability; coordinating dynamic closed-loop orchestration that utilizes on-demand heterogeneous data retrieval to resolve context challenges; and developing language-agnostic driver representations with pluggable adaptation mechanisms to improve cross-ecosystem portability and scalability.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Feng et al. (Tue,) studied this question.

www.synapsesocial.com/papers/69e9baeb85696592c86ecdfd — DOI: https://doi.org/10.3390/bdcc10040129

Authors

Xiao Feng

Shuaibing Lu

Taotao Gu

Journals

Big Data and Cognitive Computing

Actions

Institutions

Tsinghua University

Southeast University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Fuzz Driver Generation: A Survey and Outlook from the Perspective of Data Sources

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion