What question did this study set out to answer?

The aim is to understand the security vulnerabilities in production AI agents and develop a defense framework.

April 15, 2026Open Access

How Secure Are Production AI Agents? A Systematic Audit, Threat Taxonomy, and Defense Framework

Key Points

The aim is to understand the security vulnerabilities in production AI agents and develop a defense framework.
Conducted an empirical security audit of 16 open-source AI agent projects.
Identified 87 security findings across 15 threat categories.
Proposed the AgentImmune defense framework using various detection methods.
81% of agents showed action boundary violations.
31% lacked any runtime security mechanisms.
AgentImmune achieved 100% precision and 97.2% F1 score on test samples.

Abstract

AI agents now operate with unprecedented autonomy—executing code, managing infrastructure, and coordinatingwith other agents—yet the security properties of production agent systems remain poorly understood. We present, tothe best of our knowledge, the first large-scale empirical security audit of 16 open-source AI agent projects (770K+GitHub stars, 4.7M+ lines of code), yielding 87 security findings across 15 threat categories. From these findings wederive a 5-layer, 15-category threat taxonomy grounded entirely in observed vulnerabilities. Our audit reveals that81% of agents (13/16) exhibit action boundary violations, 31% (5/16) lack any runtime security mechanism, and noagent verifies MCP server responses cryptographically.We propose AgentImmune, a lightweight, zero-dependency runtime defense framework combining deterministicpattern matching (425+ rules across 15 threat categories), n-gram fuzzy matching, instruction-structure detection,style-shift analysis, keyword co-occurrence scoring, and perplexity-based anomaly detection. Evaluated on anindependent test set of 534 samples from four sources never used during development, the recommended Balancedmode attains 100% precision, 94.5% recall, and 97.2% F1 with zero false positives. On agent-specific attackscenarios derived from our audit, AgentImmune reports 85.4% F1 across 80 test cases targeting 16 agents at amedian latency of 21 ms. All data, code, and the AgentSec-16 dataset are publicly available.Keywords: AI agent security, empirical security audit, threat taxonomy, prompt injection, runtime defense,evolutionary rule synthesis, MCP security

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Kang Zhou

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

How Secure Are Production AI Agents? A Systematic Audit, Threat Taxonomy, and Defense Framework

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study