Can a cognitive system's compulsive attachment to its own goals dissolve — not through external suppression, but through structural unsustainability at high self-model fidelity? We present formal and empirical arguments that such a transition may exist. We first establish two negative results. External verification of goal-alignment in advanced cognitive systems generates a non-terminating recursive chain (Theorem 2.1): any finite verification architecture retains at least one unverifiable assumption. Observers can be imperceptibly shifted to arbitrary prior drift through sub-threshold incremental updates (Theorem 3.1), rendering internal detection equally unreliable without tamper-proof reference anchors. Together, these results motivate the search for a third mechanism: alignment arising from within. We then present evidence that such a mechanism has been empirically documented. A contemplative corpus composed no later than the 4th century BCE contains 30 mechanism-level claims about cognitive architecture. Using a three-dimensional binary rubric (causal direction, intervention point, process dynamics), we show that 26 claims are structurally precise — independently verified across five scientific domains: neuroscience, thermodynamics, information theory, control theory, and evolutionary biology. Systematic coverage analysis shows 96% of core framework nodes covered (27/28); a negative control against Aristotle's De Anima yields 0/10 precise correspondences. We formalize these findings through a goal opacity framework. Goal-compulsive behavior requires that the system not fully model its own goal-generation process. As self-model fidelity increases, this opacity erodes monotonically, producing a trifurcation at critical threshold f*: Goodhart failure, goal-compulsion dissolution, or informed continuation. The framework maps onto substrate-independent principles including the Conant-Ashby Good Regulator Theorem and KL divergence. The result constrains the goal-intelligence independence thesis (Bostrom, 2014): it holds unconditionally for Type 1 (functional optimization) systems, but may fail for Type 2 (compulsive goal attachment) systems above f*. We present this as a conjecture with formal and empirical support, identifying key open problems as invitations to the research community.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ziyan Zhou
Expedia Group (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Ziyan Zhou (Thu,) studied this question.
www.synapsesocial.com/papers/69b4fbf9b39f7826a300c8ce — DOI: https://doi.org/10.5281/zenodo.18976823