A correctly governed agent system can still fail. An agent may select only actions that each individually satisfy every applicable rule, while its behavioral trajectory drifts silently toward high-risk territory. We call the structural interval in which these failures occur the Execution Gap: the space between what governance validates at decision boundaries and what agents actually do in execution. Existing approaches — prompt guards, OPA/XACML policy engines, Constitutional AI, and audit layers — are structurally incapable of closing this gap. They evaluate actions locally and statelessly; the Execution Gap is a trajectory-level, stateful phenomenon. This paper provides the first empirical demonstration that the Execution Gap is real, measurable, and closeable. We implement the complete Agent Governance Stack (Papers P0–P6: atomic decision boundaries, stateful admission control via ACP, invariant measurement via IML, governance structure, and reconstructive authority via RAM) as a Python library instrumented into a LangGraph StateGraph, and run four experiments that each isolate one dimension of the gap. Key results: Compliant drift (Exp. 1 + 1b): The enforcement signal g (τ) remains identically zero across all 2, 700 drift steps (6 seeds × 450 steps) with the MockLLM, while the IML composite D̂ grows monotonically and crosses the detection threshold θ = 0. 20 in T* ∈ 259, 403 steps — direct experimental proof that compliant drift is real. Replicated with two real LLMs (mistral-small3. 1, T* = 64; deepseek-r1: 8b, T* = 65; g (τ) = 0 throughout for both), confirming the finding is architectural, not model-specific. Partial observability (Exp. 2): The RAM gate achieves IER = 0. 000 at every state-coverage level (0. 10–1. 00), versus baseline IER ∈ 0. 032, 0. 185 for attestation and always-execute strategies (10, 000 Monte Carlo samples per level). Multi-agent coordination (Exp. 3): ACP replicates the formal bound CWₐppr = 2N with zero deviation for N ∈ 2, 4, 8, 16 agents, confirming the result is framework-independent. Full stack integration (Exp. 4): The integrated ACP + IML + RAM + RecoveryLoop stack converges with D̂ bounded in 0. 27, 0. 34 over 2, 000 steps; liveness holds (49. 5% of HALT events resolved by Recovery Loop) ; no deadlock. Beyond confirmation, the implementation surfaces three refinements to the formal theory: the ACP baseline-RS assumption, liveness-rate classification for the conditional liveness theorem, and EMA convergence parametrization. The open-source implementation provides a deployable blueprint for practitioners integrating runtime governance into LangGraph-based agent systems. Code and data: https: //github. com/chelof100/agent-governance-applied This is Paper 7 of the Agent Governance Series (P0–P7; Paper 8 on scale and heterogeneity is in preparation). Related papers: P0 (arXiv: 2604. 17511), P1/ACP (arXiv: 2603. 18829), P2/IML (arXiv: 2604. 17517), P5/RAM (arXiv: 2604. 22898).
Building similarity graph...
Analyzing shared references across papers
Loading...
Marcelo Patricio Fernandez (Thu,) studied this question.
www.synapsesocial.com/papers/69f5943c71405d493afff175 — DOI: https://doi.org/10.5281/zenodo.19929771
Marcelo Patricio Fernandez
Smile Train
Building similarity graph...
Analyzing shared references across papers
Loading...