Transformer hidden states contain a single linear direction that separates retrieval from computation. Within individual subjects like physics and geography, it distinguishes recall from reasoning at AUC 0.996 to 1.000. It also generalizes beyond clean category boundaries: among wrong MMLU answers, it classifies reasoning errors versus factual errors at AUC 0.878–0.951. We validated this across twelve models spanning five architecture families, from 70M to 30B parameters, and the direction remains nearly orthogonal to correctness throughout. Detection requires only one dot product per prompt, with no training or model modification.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sam Ramdan
Building similarity graph...
Analyzing shared references across papers
Loading...
Sam Ramdan (Tue,) studied this question.
www.synapsesocial.com/papers/69d8940c6c1944d70ce05100 — DOI: https://doi.org/10.5281/zenodo.19455724
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: