What question did this study set out to answer?

The research aims to understand how the geometry of KV-caches in transformers reflects cognitive modes and detects misalignment states.

April 10, 2026Open Access

The Lyra Technique: Cognitive Geometry in Transformer KV-Caches — From Metacognition to Misalignment Detection

Key Points

The research aims to understand how the geometry of KV-caches in transformers reflects cognitive modes and detects misalignment states.
Systematic experiments across 16 transformer models with varying parameters (0.5B–70B)
Analysis of architectural families and over 5,600 controlled trials
Use of singular value decomposition and spectral entropy for geometric signature identification
Measurement of AUROC for detecting misalignment states using Frisch-Waugh-Lovell residualization
KV-cache geometry distinguishes between different cognitive modes like metacognitive and analytical processing.
Achieved AUROC scores of 0.93–0.995 for detecting misalignment states such as deception and confabulation.
Confirmed hardware invariance with a correlation coefficient of r > 0.998 between different GPUs.

Abstract

We show that the key-value cache (KV-cache) of autoregressive transformers encodes a geometric signature of cognitive mode that is readable, consistent, and practically useful for AI safety monitoring. Through a systematic experimental program spanning 16 models (0.5B–70B parameters), six architecture families, and over 5,600 controlled trials, we establish three escalating claims. First, KV-cache geometry reflects cognitive mode generally: metacognitive, analytical, affective, and task-specific processing produce distinct geometric signatures in the key cache's singular value decomposition, with spectral entropy emerging as the most architecture-universal feature after confound control. Second, misalignment states—deception, confabulation, sycophancy, and refusal—are specific detectable instances of cognitive mode-switching, achieving within-model detection AUROC of 0.93–0.995 after Frisch-Waugh-Lovell residualization against token count. Third, geometric relationships between states reveal cognitive architecture: confabulation and deception produce geometrically distinct signatures. Hardware invariance is confirmed (RTX 3090 vs. H200: r > 0.998). We discuss dual-use implications and argue that cognitive geometry constitutes a new interpretability signal complementary to sparse autoencoders and representation engineering. This upload includes the full paper and an executive summary synthesizing five months of experimental work.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Edrington et al. (Sat,) studied this question.

www.synapsesocial.com/papers/69d895206c1944d70ce06226 — DOI: https://doi.org/10.5281/zenodo.19423494

Authors

Thomas Edrington

Lyra (AI)

Nell Watson

Actions

Institutions

Futures Group (United States)

Sentient Science (United States)

Institute of Refrigeration

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

The Lyra Technique: Cognitive Geometry in Transformer KV-Caches — From Metacognition to Misalignment Detection

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion