What question did this study set out to answer?

The research aims to reframe AI alignment failures as security problems and propose new governance methods.

March 26, 2026Open Access

Pre-Stabilisation Signals in Complex Systems: An Empirical Protocol for Testing Governance Sufficiency and an Invitation to the Research Community

Key Points

The research aims to reframe AI alignment failures as security problems and propose new governance methods.
Introduces T-Monitoring, T-Locking, and T-Governance as alternatives to existing AI safety protocols.
Analyzes seven historical case studies through a security lens related to governance sufficiency.
Conducts three falsifiable experiments on GPT-2 117M with outlined protocols and expected results.
Identifies security perspectives on AI issues like hallucination and reward hacking.
Establishes KL divergence as a physical measure for governance sufficiency in real-time.
Operationalizes diagnostic criteria for AI governance effectiveness with specific thresholds.

Abstract

Papers 1 and 2 of the Verbanatomy series established a conceptual vocabulary for pre-stabilisation dynamics and a mathematical framework for governance sufficiency. This paper does something different. It proposes that AI drift, hallucination, reward hacking, goal misgeneralisation, and sycophancy are not alignment failures in the conventional sense — they are security problems. The guardrails and safety rails currently deployed against these failures are output-layer defences: they activate after the Tendency field has already converged to an inadmissible state. This paper provides four things: (1) A security reframing of AI governance. Current guardrails — constitutional AI, RLHF, output filters, safety rails — are Node-layer instruments applied to systems whose critical dynamics occur at the Tendency layer. We introduce T-Monitoring, T-Locking, and T-Governance as proposed formation-layer alternatives, formally grounded in Theorem 2 of Paper 2 and the Pre-Node primitive of Paper 1. Following Landauer's Principle (1961) and the Donsker-Varadhan formula (1976), KL divergence is established as the physical surrogate for the Energy Gap Δ, allowing Governance Sufficiency Γ to be calculated in real-time bit-units. (2) Retrospective application to seven historical case studies — Knight Capital (440M, 2012), the Flash Crash (1T, 2010), LLM sycophancy, Google Gemini hallucination (88% hallucination rate, AA-Omniscience benchmark), Microsoft Copilot fabrication and Graph Pollution, BGP routing dynamics, and cellular reprogramming (Nobel Prize, 2012) — each reframed through the Attacker/Governor security lens and mapped to the Γ = Δ / (Eₕigh − Edrift) ratio. (3) Three falsifiable experiments on GPT-2 117M using open-source tools, with complete protocols, predicted results, Python implementation sketches, and specific falsification criteria including: if Γ exceeds 1. 1 but the system still converges to an inadmissible state, the Energy Gap formulation is falsified. The Jacobian diagnostic λₘin (Jgov) → 0 is operationalised with a Red Flag threshold of λₘin < 0. 05 — the value below which Pre-Node activation is mandatory. (4) Specific invitations to ML interpretability, control theory, complexity science, developmental biology, and security engineering researchers, with targeted research questions for each community. Paper 1 DOI: 10. 5281/zenodo. 19190423 Paper 2 DOI: 10. 5281/zenodo. 19192832

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Vishwanathprasad Balasubramanian

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Pre-Stabilisation Signals in Complex Systems: An Empirical Protocol for Testing Governance Sufficiency and an Invitation to the Research Community

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study