What question did this study set out to answer?

Assess the performance of the GPTMem framework on GPT-2 Small and evaluate its memory structure.

March 16, 2026Open Access

From Frozen-Key Failure to Causal Tail Carriers: A Real-Model Audit of GPTMem on GPT-2 Small

Key Points

Assess the performance of the GPTMem framework on GPT-2 Small and evaluate its memory structure.
Performed an empirical audit of GPTMem on GPT-2 Small with 124 million parameters.
Analyzed the admissibility ratio and holding function across four target layers.
Conducted causal interventions to assess effects of identified heads in the model.
Frozen-key regime condition fails across tested layers, with εFK exceeding thresholds significantly.
Observed holding function exhibits non-Markovian decay with strong linearity (R² > 0.98).
Only one identified head (L10H10) shows positive effects, accounting for substantial memory influence, while others exhibit negative effects.

Abstract

We report the first real-model empirical audit of the GPTMem framework on GPT-2 Small (124M parameters). Three principal findings emerge. First, the frozen-key regime condition (Test 0) fails categorically: the admissibility ratio εFK exceeds the threshold by four to five orders of magnitude in all tested layers, rendering the frozen-attention linearisation inapplicable to this model. Second, the holding function h (Δ) = ‖LS_Δ‖op exhibits clean offset-algebraic decay, h (Δ) = c + AΔ^−β, with an observed residual holding floor c > 0 in all four target layers (R² > 0. 98), confirming non-Markovian memory structure while falsifying the pure-algebraic prediction β ≈ 1. 5–3. Third, causal intervention reveals that the four “Traitor Heads” identified in the Paris Lobotomy are not a homogeneous class: only L10H10 has positive tail and floor effects; the other three heads have negative effects upon ablation. L10H10 alone accounts for ≈64% of the full four-head tail effect and ≈70% of the floor effect in Layer 10, with z > 20 against random single-head controls and strong superadditive synergy with L10H0. Layer 11 operates as downstream rebalancing, not as a tail origin. The categorical core of GPTMem (the category, structural lemmas, Three Laws) remains untouched; what these data falsify is the frozen-key empirical bridge, not the theoretical programme. The intervention-first route through tail/floor metrics and matched controls emerges as the viable empirical access path.

From Frozen-Key Failure to Causal Tail Carriers: A Real-Model Audit of GPTMem on GPT-2 Small

Key Points

Abstract

Cite This Study