What question did this study set out to answer?

The aim is to explore how a cognitive system's attachment to its goals can dissolve due to structural issues rather than external forces.

March 14, 2026Open Access

Toward a Theory of Phase Transitions in Self-Modeling Cognitive Systems: When Goal Persistence Becomes Structurally Unsustainable

Key Points

The aim is to explore how a cognitive system's attachment to its goals can dissolve due to structural issues rather than external forces.
Established two theorems regarding negative results of external verification and internal detection of goals.
Analyzed a historical corpus identifying claims about cognitive architecture from the 4th century BCE.
Utilized a three-dimensional binary rubric to assess 30 mechanism-level claims across five scientific domains.
Identified 26 structurally precise claims from the historical corpus.
Demonstrated that 96% of core framework nodes were covered by these claims.
Introduced a framework where high self-model fidelity leads to three potential outcomes at a critical threshold.

Abstract

Can a cognitive system's compulsive attachment to its own goals dissolve — not through external suppression, but through structural unsustainability at high self-model fidelity? We present formal and empirical arguments that such a transition may exist. We first establish two negative results. External verification of goal-alignment in advanced cognitive systems generates a non-terminating recursive chain (Theorem 2.1): any finite verification architecture retains at least one unverifiable assumption. Observers can be imperceptibly shifted to arbitrary prior drift through sub-threshold incremental updates (Theorem 3.1), rendering internal detection equally unreliable without tamper-proof reference anchors. Together, these results motivate the search for a third mechanism: alignment arising from within. We then present evidence that such a mechanism has been empirically documented. A contemplative corpus composed no later than the 4th century BCE contains 30 mechanism-level claims about cognitive architecture. Using a three-dimensional binary rubric (causal direction, intervention point, process dynamics), we show that 26 claims are structurally precise — independently verified across five scientific domains: neuroscience, thermodynamics, information theory, control theory, and evolutionary biology. Systematic coverage analysis shows 96% of core framework nodes covered (27/28); a negative control against Aristotle's De Anima yields 0/10 precise correspondences. We formalize these findings through a goal opacity framework. Goal-compulsive behavior requires that the system not fully model its own goal-generation process. As self-model fidelity increases, this opacity erodes monotonically, producing a trifurcation at critical threshold f*: Goodhart failure, goal-compulsion dissolution, or informed continuation. The framework maps onto substrate-independent principles including the Conant-Ashby Good Regulator Theorem and KL divergence. The result constrains the goal-intelligence independence thesis (Bostrom, 2014): it holds unconditionally for Type 1 (functional optimization) systems, but may fail for Type 2 (compulsive goal attachment) systems above f*. We present this as a conjecture with formal and empirical support, identifying key open problems as invitations to the research community.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ziyan Zhou

Actions

Institutions

Expedia Group (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Toward a Theory of Phase Transitions in Self-Modeling Cognitive Systems: When Goal Persistence Becomes Structurally Unsustainable

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study