What question did this study set out to answer?

This paper aims to identify and categorize the different failure modes of large language models caused by inherent ambiguities in human language.

May 21, 2026Open Access

Three Measurable Failure Modes of Large Language Models

Read Full Paperexternally

Key Points

This paper aims to identify and categorize the different failure modes of large language models caused by inherent ambiguities in human language.
Conducted a controlled experimental series on a synthetic LSTM with parameters D=256 and L=10.
Utilized three quantitative metrics: correction sensitivity, dimensional excess, and output entropy.
Performed causal localization of perturbations and compared the performance of multicellular versus monolithic models.
Correction sensitivity decreased from 0.273 ± 0.095 at k=1 to 0.067 ± 0.037 at k=10.
Pearson correlation between dimensional excess and unknown correction sensitivity was r(DE, CS_unknown) = 0.9896.
Causal localization reached 100% accuracy across 180 trials, confirming the method's effectiveness.

Abstract

Human language is inherently ambiguous - not a deterministic code but an ensemble of overlapping meanings whose disambiguation depends on context that is often incomplete or absent. A system that processes natural language must therefore be probabilistic, not by architectural choice but by mathematical necessity. This paper argues that the resulting uncertainty has structure: what the field calls hallucinations is not one phenomenon but three structurally distinct failure modes of this probabilistic nature, each with a different causal origin, a different measurable signature, and a different class of solutions. Mode 1 (autoregressive reinforcement) is the self-consistent wrong trajectory produced when an error contaminates the model's own conditioning context. Mode 2 (confabulation) is fluent generation produced from parameter directions that received no training signal - the null space of the weight matrix. Mode 3 (irreducible uncertainty) is the correct response of a calibrated probabilistic system to a genuinely ambiguous query. Each mode has a computable quantitative metric: correction sensitivity (CS), dimensional excess (DE), and output entropy (H₎ₔₓ). The three measurements rest on a single coding-theoretic construction, the syndrome table S = N (J V) ^, whose full derivation is in the companion paper "A Syndrome Algebra for Differentiable Parametric Systems". A controlled experimental series on a synthetic LSTM (D=256, L=10, six fixed seeds) confirms the framework end to end. The three metrics separate cleanly: the CS gap between known and unknown domains narrows monotonically from 0. 273 0. 095 at k=1 to 0. 067 0. 037 at k=10. The Pearson correlation r (DE, CSₔ₍₊₍₎ₖ₍) = 0. 9896 across k predicts out-of-domain failure from weight matrix alone. Causal localisation of an injected perturbation reaches 100\% accuracy over 180 trials with a pre/post residual ratio of approximately 2 10⁸. Oracle correction is exact (cosine 1. 000000 over 36, 000 trials). A direct comparison of multicellular specialists against monolithic generalists shows the Singleton-bound multicellular advantage grows from 0. 158 0. 049 at N=5 to 0. 310 0. 054 at N=10 in CS gap, empirically justifying the modular hierarchy. Additional notes: This preprint is accompanied by the mathematical paper A Syndrome Algebra for Differentiable Parametric Systems (see related identifiers). Code and data are available at the linked GitHub repository. Model weights are not included due to size; they are regenerated deterministically from the provided scripts and canonical seeds.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Marek Hubka

Actions

Institutions

Quantified Uncertainty Research Institute

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Three Measurable Failure Modes of Large Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study