Autoregressive language models compress a rich hidden state into a single token and build their next state on that compression. We ask whether this recursive feedback loop is causally necessary for the trajectory-level structure observed in generation. A controlled intervention on a bilingual mixture of character-level transformers holds components, weights, seeds, and per-step distributions fixed while removing only token re-injection. With the loop severed, per-step divergence fails to accumulate (JSD slope at machine zero) and regime structure collapses. Close the loop and persistent regime-switching dynamics return, with a positive divergence rate (0.689 ± 0.012 nats/char) at every tested temperature. The same intervention inside a jointly trained bilingual model shows the same contrast. Recursive generation departs from the seed (fidelity 79% → 54%), switches regimes (median 3), and distributes occupancy (70%). Static generation stays seed-locked (93% fidelity), monolithic (95% occupancy), and largely non-switching (median 0). Regime identity is not predictable from the first 1000 generated characters and does not separate under geometric clustering at any layer, yet a linear probe recovers it at 96%. Four independent methods, including an endogenous decomposition that requires no reference models, agree on the same partition. Partial-feedback experiments show that any nonzero reinjection produces dynamics; steady-state divergence is invariant to feedback rate across three orders of magnitude, to token content beyond variability, and to reinjection lag up to 100 steps. The theory's sufficient conditions for exact metastability do not hold in this setting. A mechanism consistent with the theory operates despite failure of its sufficient conditions. In this construction, the loop converts local per-step disagreement into global trajectory-level dynamics. The basin is not a place in latent space: it is the trajectory.
Building similarity graph...
Analyzing shared references across papers
Loading...
Claudio Irrgang
Building similarity graph...
Analyzing shared references across papers
Loading...
Claudio Irrgang (Mon,) studied this question.
www.synapsesocial.com/papers/69df2c77e4eeef8a2a6b1929 — DOI: https://doi.org/10.5281/zenodo.19547461