We identify and formalize AI In The Loop (AITL), a paradigm where AI systems autonomously generate, evaluate, and improve with human intervention restricted to boundary supervision rather than operational decision-making. AITL extends the RLAIF principle—replacing human feedback with AI feedback—from training to the full AI system lifecycle. This is a framework paper with proof-of-concept validation; we propose a unifying taxonomy rather than a benchmark study. Through analysis of four systems (AlphaZero, Constitutional AI, SWE-agent, autoresearch), we extract common properties and propose a unifying taxonomy: self-generation, self-evaluation, self-improvement, and human observation. We validate AITL through a controlled experiment using the Autonomous Empirical Optimization System (AEOS), a model-agnostic ML sandbox where two LLM agents autonomously built ML pipelines on a semantically-stripped dataset. In our experiment, we observe that the dominant human role in AITL shifts from iterative ML engineering (O(n) per iteration) to boundary supervision (O(1) per experiment). Our contributions are: (1) formalization of AITL as a unifying framework for closed-loop autonomous systems, (2) a taxonomy connecting existing systems under shared properties, (3) empirical validation via AEOS demonstrating autonomous agent stopping behavior, and (4) identification of failure modes including a novel Sunk-Cost Continuation failure mode (F6), where agents continue low-yield exploration despite prolonged stagnation. We position AITL as a natural evolution of AI evaluation, suggesting scalable directions infeasible under HITL constraints.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sanskar jajoo
Building similarity graph...
Analyzing shared references across papers
Loading...
Sanskar jajoo (Mon,) studied this question.
www.synapsesocial.com/papers/69df2ba0e4eeef8a2a6b09f3 — DOI: https://doi.org/10.5281/zenodo.19551173
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: