March 3, 2026Open Access

AICD Bench: A Challenging Benchmark for AI-Generated Code Detection

Key Points

Performance in detecting ai-generated code remains limited, particularly with hybrid or adversarial code.
Extensive evaluation on 77 models showed challenges in robust binary classification under distribution shifts.
Three detection tasks include robust binary classification, model family attribution, and fine-grained human-machine classification.
AICD Bench, spanning 2M examples and 9 programming languages, highlights the need for advanced detection methods.

Abstract

Large language models (LLMs) are increasingly capable of generating functional source code, raising concerns about authorship, accountability, and security. While detecting AI-generated code is critical, existing datasets and benchmarks are narrow, typically limited to binary human-machine classification under in-distribution settings. To bridge this gap, we introduce AICD Bench, the most comprehensive benchmark for AI-generated code detection. It spans 2M examples, 77 models across 11 families, and 9 programming languages, including recent reasoning models. Beyond scale, AICD Bench introduces three realistic detection tasks: (i) ~Robust Binary Classification under distribution shifts in language and domain, (ii) ~Model Family Attribution, grouping generators by architectural lineage, and (iii) ~Fine-Grained Human-Machine Classification across human, machine, hybrid, and adversarial code. Extensive evaluation on neural and classical detectors shows that performance remains far below practical usability, particularly under distribution shift and for hybrid or adversarial code. We release AICD Bench as a unified, challenging evaluation suite to drive the next generation of robust approaches for AI-generated code detection. The data and the code are available at https: //huggingface. co/AICD-bench}.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Daniil Orel

Mohamed bin Zayed University of Artificial Intelligence

Dilshod Azizov

Mohamed bin Zayed University of Artificial Intelligence

Indraneil Paul

Technical University of Darmstadt

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

AICD Bench: A Challenging Benchmark for AI-Generated Code Detection

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study