Status: NeurIPS 2026 submission under double-blind review. Author identity anonymized. Self-improving AI agents lack runtime safeguards that prevent evaluation drift, fragile outcome acceptance, and unbounded parameter updates from compounding into catastrophic policy degradation. We study cognitive policy oscillation -- strategy degradation caused by hallucinated feedback -- and map an oscillation phase diagram for self-improving agents (384 synthetic + 32 LLM conditions). A sharp instability boundary emerges at moderate step sizes (h approx 0.2), yielding a phase-aware deployment rule. WhyLab: a conditional causal audit framework activating only in the unstable regime: C1: Information-theoretic drift index C2: Sensitivity filter combining E-values and partial R2 bounds C3: Lyapunov-bounded damping controller Our contribution is boundary delineation: identifying when intervention is warranted, not universal improvement. In controlled unstable regimes, the audit reduces oscillation by 76%. On adversarial LLM tasks, fixed C2 reduces regressions by 44% on Gemini 2.0 Flash (p=0.014, Bonferroni-adjusted p=0.042). In the stable regime (SWE-bench Lite, 10,500 episodes), the audit remains inactive, as predicted. Docker evaluations on Gemini 2.0/2.5 Flash show zero observed C2-caused regressions. Change log (v2 vs v1): Abstract condensed to boundary-delineation framing (honest null-result acknowledgement); C2 targeted SWE-bench selective follow-up transparently reported (no net gain vs fixed C2); Docker Gemini 2.5 Flash full Docker evaluation added; phase-aware deployment rule formalized; references and deployment checklist expanded.
Building similarity graph...
Analyzing shared references across papers
Loading...
Anonymous Author (Sun,) studied this question.
www.synapsesocial.com/papers/69e71423cb99343efc98d8f9 — DOI: https://doi.org/10.5281/zenodo.19063714
Anonymous Author
American Foundation for the Blind
Building similarity graph...
Analyzing shared references across papers
Loading...