What question did this study set out to answer?

The research aims to develop a framework to ensure stability and safety in self-improving AI agents during their operations.

March 13, 2026Open Access

WhyLab: A Causal Audit Framework for Stable Agent Self-Improvement

Key Points

The research aims to develop a framework to ensure stability and safety in self-improving AI agents during their operations.
Developed a causal audit framework with three defenses: information-theoretic drift detection, dual-threshold filtering, and adaptive damping.
Conducted experiments in synthetic environments to evaluate the effectiveness of the proposed defenses.
Analyzed detection reliability, fragile acceptance rates, and violation frequencies.
C1 significantly improved detection reliability within the operational horizon.
C2 led to a marked reduction in fragile acceptance rates of AI outcomes.
C3 demonstrated the lowest frequency of policy violations with effective alignment of proxies and state.

Abstract

Self-improving AI agents lack runtime safeguards that prevent evaluation drift, fragile outcome acceptance, and unbounded parameter updates from compounding into catastrophic policy degradation. WhyLab introduces a causal audit framework comprising three complementary defenses: C1: Information-theoretic drift detection across evaluation streams C2: E-value × Robustness Value dual-threshold filter for fragile outcomes C3: Lyapunov-bounded adaptive damping with observable energy proxy Experiments on synthetic environments demonstrate that C1 improves within-horizon detection reliability, C2 substantially reduces fragile acceptance rates, and C3 achieves the lowest violation frequency with strong proxy–state alignment. Code: https://github.com/neogenesislab/WhyLab-NeurIPS2026

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Anonymous (Wed,) studied this question.

www.synapsesocial.com/papers/69b3ac2b02a1e69014ccda8e — DOI: https://doi.org/10.5281/zenodo.18948929

WhyLab: A Causal Audit Framework for Stable Agent Self-Improvement

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion