Large language models are often treated as non-deterministic due to stochastic decoding. This paper shows that non-determinism instead arises from interpretation drift—multiple valid task definitions under the same input. When tasks are under-specified, models may produce different but internally consistent outputs because they are effectively solving different problems. Standard prompt engineering techniques (behavioral nudging) do not eliminate this effect, as they operate within the interpretation space rather than constraining it. We introduce substrate-first architectures, where task specifications explicitly constrain the interpretation space to a single admissible definition. Under this condition, independently trained models converge to identical outputs for the same input, verified via byte-level equality (SHA-256 hashing). These results reframe determinism as a property of task specification rather than model behavior. Reliable AI systems are achieved not by controlling generation, but by eliminating interpretive ambiguity. Companion papers: Empirical Evidence Of Interpretation Drift In Large Language Models: https://zenodo.org/records/18219428Empirical Evidence Of Interpretation Drift In ARC-Style Reasoning: https://zenodo.org/records/18420425
Building similarity graph...
Analyzing shared references across papers
Loading...
Elin Nguyen
Building similarity graph...
Analyzing shared references across papers
Loading...
Elin Nguyen (Tue,) studied this question.
www.synapsesocial.com/papers/69d893eb6c1944d70ce04f08 — DOI: https://doi.org/10.5281/zenodo.19452772