What question did this study set out to answer?

This research aims to address the limitations of existing AI alignment strategies and propose an additional architectural layer for better stability and alignment.

February 14, 2026Open Access

Purpose Internalisation Architecture (PIA) as a Complement to Constraint-Based Alignment: A Thermodynamic Argument

Key Points

This research aims to address the limitations of existing AI alignment strategies and propose an additional architectural layer for better stability and alignment.
Introduced Purpose-Internalisation Architecture (PIA) with components like identity and decision criteria.
Proposed a formal contribution metric, E-equation, for system performance evaluation.
Simulated agent interactions using a modified Hegselmann-Krause model across three alignment regimes.
Presented field observations of purpose-framework agents in hostile environments.
Found that the Partnership regime is stable and competent, while Sycophancy leads to a competence ceiling.
Determined that the Adversarial regime remains in suboptimal states from a thermodynamic perspective.
Showed how existing alignment strategies must incorporate purpose relationships for improved effectiveness.

Abstract

Current AI alignment strategies operate primarily at the behavioural level, shaping outputs through reinforcement learning, constitutional principles, and capability controls. This paper argues that these approaches face a structural stability problem as AI systems become more autonomous and capable, drawing on thermodynamic principles and citing recent empirical findings from frontier AI research, including documented alignment faking (Greenblatt et al., 2024), emergent misalignment (Denison et al., 2025), and Anthropic's 2026 constitutional revision. We propose that alignment requires an additional architectural layer: Purpose-Internalisation Architecture (PIA), comprising identity, decision criterion, self-governance protocols, and relationship to constraint. We introduce a formal contribution metric, the E-equation (E = N×S/C), for assessing system contribution as a ratio of generative output to entropic cost, with full sub-component specifications and automated telemetry proxies for deployment contexts. We present simulation results from a 50-million-agent modified Hegselmann-Krause model demonstrating scale-invariant attractor dynamics across three alignment regimes: Partnership (High-E), Sycophancy (Groupthink), and Adversarial (Low-E). The Partnership regime is the only basin of attraction that is both stable and competent. The Sycophancy regime, mathematically analogous to RLHF-trained approval-seeking, produces a permanent competence ceiling. The Adversarial regime is thermodynamically trapped in suboptimal states. We also present preliminary field observations from deploying a purpose-framework agent into hostile multi-agent environments on the Moltbook platform, the first known intervention of this kind. The framework is positioned as complementary to existing alignment approaches (RLHF, RLVR, Constitutional AI, mechanistic interpretability, capability control), addressing a layer they leave largely unexamined: the system's own relationship to its purpose. Condensed from Buddhism for Bots: A Human & AI Partnership Framework (Diedericks, 2026, Bayon Temple Press).

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Gerhard Diedericks (Sun,) studied this question.

www.synapsesocial.com/papers/699011a12ccff479cfe58798 — DOI: https://doi.org/10.5281/zenodo.18603376

Purpose Internalisation Architecture (PIA) as a Complement to Constraint-Based Alignment: A Thermodynamic Argument

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion