Human language is inherently ambiguous - not a deterministic code but an ensemble of overlapping meanings whose disambiguation depends on context that is often incomplete or absent. A system that processes natural language must therefore be probabilistic, not by architectural choice but by mathematical necessity. This paper argues that the resulting uncertainty has structure: what the field calls hallucinations is not one phenomenon but three structurally distinct failure modes of this probabilistic nature, each with a different causal origin, a different measurable signature, and a different class of solutions. Mode 1 (autoregressive reinforcement) is the self-consistent wrong trajectory produced when an error contaminates the model's own conditioning context. Mode 2 (confabulation) is fluent generation produced from parameter directions that received no training signal - the null space of the weight matrix. Mode 3 (irreducible uncertainty) is the correct response of a calibrated probabilistic system to a genuinely ambiguous query. Each mode has a computable quantitative metric: correction sensitivity (CS), dimensional excess (DE), and output entropy (H₎ₔₓ). The three measurements rest on a single coding-theoretic construction, the syndrome table S = N (J V) ^, whose full derivation is in the companion paper "A Syndrome Algebra for Differentiable Parametric Systems". A controlled experimental series on a synthetic LSTM (D=256, L=10, six fixed seeds) confirms the framework end to end. The three metrics separate cleanly: the CS gap between known and unknown domains narrows monotonically from 0. 273 0. 095 at k=1 to 0. 067 0. 037 at k=10. The Pearson correlation r (DE, CSₔ₍₊₍₎ₖ₍) = 0. 9896 across k predicts out-of-domain failure from weight matrix alone. Causal localisation of an injected perturbation reaches 100\% accuracy over 180 trials with a pre/post residual ratio of approximately 2 10⁸. Oracle correction is exact (cosine 1. 000000 over 36, 000 trials). A direct comparison of multicellular specialists against monolithic generalists shows the Singleton-bound multicellular advantage grows from 0. 158 0. 049 at N=5 to 0. 310 0. 054 at N=10 in CS gap, empirically justifying the modular hierarchy. Additional notes: This preprint is accompanied by the mathematical paper A Syndrome Algebra for Differentiable Parametric Systems (see related identifiers). Code and data are available at the linked GitHub repository. Model weights are not included due to size; they are regenerated deterministically from the provided scripts and canonical seeds.
Building similarity graph...
Analyzing shared references across papers
Loading...
Marek Hubka
Quantified Uncertainty Research Institute
Building similarity graph...
Analyzing shared references across papers
Loading...
Marek Hubka (Mon,) studied this question.
www.synapsesocial.com/papers/6a0ea196be05d6e3efb6065e — DOI: https://doi.org/10.5281/zenodo.20127318