We report the first real-model empirical audit of the GPTMem framework on GPT-2 Small (124M parameters). Three principal findings emerge. First, the frozen-key regime condition (Test 0) fails categorically: the admissibility ratio εFK exceeds the threshold by four to five orders of magnitude in all tested layers, rendering the frozen-attention linearisation inapplicable to this model. Second, the holding function h (Δ) = ‖LS_Δ‖op exhibits clean offset-algebraic decay, h (Δ) = c + AΔ^−β, with an observed residual holding floor c > 0 in all four target layers (R² > 0. 98), confirming non-Markovian memory structure while falsifying the pure-algebraic prediction β ≈ 1. 5–3. Third, causal intervention reveals that the four “Traitor Heads” identified in the Paris Lobotomy are not a homogeneous class: only L10H10 has positive tail and floor effects; the other three heads have negative effects upon ablation. L10H10 alone accounts for ≈64% of the full four-head tail effect and ≈70% of the floor effect in Layer 10, with z > 20 against random single-head controls and strong superadditive synergy with L10H0. Layer 11 operates as downstream rebalancing, not as a tail origin. The categorical core of GPTMem (the category, structural lemmas, Three Laws) remains untouched; what these data falsify is the frozen-key empirical bridge, not the theoretical programme. The intervention-first route through tail/floor metrics and matched controls emerges as the viable empirical access path.
Jonas Jakob Gebendorfer (Sat,) studied this question.