What question did this study set out to answer?

The research aims to unify common failure modes in large language models and propose a control framework for their management.

April 3, 2026Open Access

From Monitoring to Intervention: Control-Theoretic Coherence Management in Transformers and the Limits of Discrete Safety Enforcement

Key Points

The research aims to unify common failure modes in large language models and propose a control framework for their management.
Introduced Recync framework for coherence management in Transformers.
Utilized Phi-mapping and Psi-mapping for monitoring and intervention.
Conducted 69 systematic experiments across different model architectures.
Phi-mapping effectively separated failure modes with high statistical significance.
Identified key constraints on token-level control not predicted by existing theories.
Achieved positive effects with severity-adaptive control across different severity levels.

Abstract

From Monitoring to Intervention: Control-Theoretic Coherence Management in Transformers and the Limits of Discrete Safety Enforcement Author: Kentaro Sato (Independent Researcher) Summary Large language models fail in characteristic ways -- repetition loops, hallucination, and context loss -- yet most monitoring and alignment approaches treat these as unrelated problems. This paper introduces Recync, a control framework that unifies all three failure modes in a single three-dimensional order-parameter space Z (t) = lambda, lambdaₛem, z, representing temporal synchrony, semantic coherence, and structural persistence. A non-invasive projection (Phi-mapping) extracts this state from Transformer internals at runtime -- reading attention weights, residual stream activations, and the KV cache -- without modifying model weights or requiring additional forward passes. The dynamics of Z (t) are governed by a Ginzburg-Landau potential with provable stability, and safety is enforced through stochastic Control Barrier Functions (CBF) solved via quadratic programming. A Psi-mapping translates abstract control commands into operational parameter adjustments (temperature, top-p, hidden-state corrections) for closed-loop intervention. The paper then systematically tests how far this framework can push token-level intervention, establishing both its capabilities and its structural limits through a 69-experiment campaign. Theoretical Framework The framework consists of five components: State space and Phi-mapping: Three order parameters -- temporal synchrony lambda (t) from attention weights, semantic coherence lambdaₛem (t) from residual stream cosine similarity, and structural persistence z (t) from KV cache autocorrelation -- are extracted at each generation step without model modification. Ginzburg-Landau dynamics: The evolution of Z (t) is governed by a phenomenological potential U (Z; Theta) with formal stability guarantees (Theorem 1), following the precedent of GL theory in physics -- phenomenological yet predictive. CBF safety control: A stochastic Control Barrier Function enforces safety constraints with minimum intervention, solved at each step via quadratic programming. Psi-mapping: Translates control commands from the abstract order-parameter space back into operational adjustments (temperature scaling, nucleus threshold, hidden-state steering vectors). Achieves consistency error below 1e-5 across 1, 738 control steps. Dual-channel intervention: Sampling-parameter modulation (temperature/top-p) for mild corrections, and orthogonal projection steering of the residual stream for direct hidden-state intervention. Experimental Campaign 69 experiments across six phases, totaling approximately 15, 000 paired generation runs on three model architectures: Phase Experiments Focus I. Mapping validation 01-08 Phi-mapping extraction, failure-mode separation, Psi-mapping consistency II. Initial CBF control 09-18 Intervention frequency, threshold types, time-scale dependence III. Systematic limits 19-42 Harm threshold discovery, parameter-space boundary mapping IV. Residual stream steering 43-52 Orthogonal projection, cascade failure analysis, attractor switch mechanism V. Severity-adaptive control 53-63 Three-region hysteresis controller, recovery gate, pooled significance VI. Cross-model validation 64-69 GPT-2 Medium (355M), Pythia-160M (160M), transferability analysis Models tested: GPT-2 Small (117M), GPT-2 Medium (355M), Pythia-160M (160M) Infrastructure: Apple Silicon (MPS), 16GB RAM, PyTorch Key Results Result 1 -- Robust monitoring signal (immediately deployable): The Phi-mapping separates failure modes with very large effect sizes on the primary model (GPT-2 Small): Comparison lambdaₛem z ANOVA (4 categories, N=400) F=71. 76, p=4. 43e-37 F=44. 11, p=1. 29e-24 Normal vs Fragmentation d=1. 833, p=3. 27e-28 d=1. 217, p=2. 39e-15 Normal vs Hallucination d=1. 364, p=2. 79e-18 d=0. 877, p=3. 15e-09 This signal is immediately deployable as a runtime health indicator on any Transformer that exposes attention weights and hidden states. Result 2 -- Structural limits of discrete token-level CBF control: Systematic experiments reveal three constraints not predicted by continuous-time theory: A harm threshold in intervention frequency (10-24 interventions per run) above which control degrades performance The necessity of relative over absolute triggering thresholds Time-scale dependence requiring re-tuning when generation length changes The optimal sampling-level configuration achieves harm-neutral control but no positive effect, establishing these as the reachable limits of parameter-space intervention. Result 3 -- Residual stream steering and the semantic attractor switch: Direct hidden-state steering via orthogonal projection achieves the first statistically significant positive effect (d = 0. 712, p = 0. 006). However, replication reveals seed-dependent cascade failures: small corrections to the hidden state produce tiny logit shifts that, under stochastic sampling, select different tokens and push generation into entirely different semantic attractor basins within 2-3 steps. This newly identified mechanism -- the semantic attractor switch -- explains why fixed-parameter interventions face a fundamental tradeoff. Result 4 -- Severity-adaptive control as Pareto improvement: A three-region hysteresis controller modulates temperature as a function of crisis severity (free sampling at low severity, greedy decoding at high severity, linear interpolation between). This achieves simultaneous positive effects across both vulnerable and resilient seed groups (d = +0. 182 and d = +0. 522 respectively) with significant temperature-improvement correlation (r = -0. 382, p = 0. 002). No fixed temperature produces this Pareto improvement. Result 5 -- Recovery gate achieves first significant pooled effect: Analysis reveals that not all detected crises require intervention -- some are transient fluctuations from which the model self-recovers. A trend-based recovery gate that detects rising coherence within crisis windows correctly skips 44% of interventions, rescuing a previously negative seed group (d: -0. 287 to +0. 117) and converting the overall result from non-significant to significant: d = +0. 211, p = 0. 037, N = 180. Cross-model validation: Detection generalizes across all three architectures. Intervention requires model-specific severity calibration -- GPT-2 Small thresholds push 67% of GPT-2 Medium crises into the highest severity bucket, suppressing natural recovery. Bucket-level analysis shows intervention is effective when severity is correctly calibrated (Pythia MED-bucket d = +1. 296, p = 0. 034). Primary Contribution A decomposition of the intervention problem into three independent components: Phi-mapping -- a robust, model-agnostic monitoring signal (validated across 69 experiments and three architectures) Severity-adaptive control -- determines how strongly to intervene Recovery gate -- determines whether to intervene at all This decomposition resolves the detection-intervention asymmetry that dominates the experimental record. The modest intervention effect sizes (d = +0. 211) despite robust detection (d > 1. 3) motivate a fundamentally different approach at response granularity, developed in the companion paper. Companion Paper Beyond Micro-Control: Response-Level Checkpoint Restart for Safe Coherence Recovery in Transformers -- which achieves d = +0. 494 to +1. 020 with zero iatrogenic harm by shifting from token-level to response-level intervention. Resources Repository: github. com/metaSATOKEN/Recyncframework -- full source, test suite, and scripts to reproduce all figures and tables License: CC BY 4. 0 (paper), Apache 2. 0 (code) Keywords: LLM safety, control barrier functions, order parameters, Transformer monitoring, coherence dynamics, residual stream steering, severity-adaptive control

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Kentaro Sato (Sat,) studied this question.

www.synapsesocial.com/papers/69cf5ecb5a333a821460d661 — DOI: https://doi.org/10.5281/zenodo.19148449

From Monitoring to Intervention: Control-Theoretic Coherence Management in Transformers and the Limits of Discrete Safety Enforcement

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion