Autonomous AI agent systems exhibit gradual behavioral drift — termed normalization of deviance — that systematically evades threshold-based monitoring approaches. We present empirical evidence from three independent test runs (72% to 100% pass rate) and a 19-day production silent failure demonstrating that: (1) stateful session tracking detects 6.7% drift vs 19.3% undetected in stateless mode; (2) defense-in-depth creates blind spots where gateways mask application vulnerabilities; (3) HTTP gateways provide zero MCP protocol security. We propose graph-based TSAD as the methodological framework for multi-agent behavioral monitoring.
Michael K. Saleme (Mon,) studied this question.