What question did this study set out to answer?

This research aims to explore whether language model hallucinations exhibit consistent patterns in internal activations before errors occur.

May 3, 2026Open Access

Hallucination Fingerprints: Consistent Failure Patterns in Large Language Models

Key Points

This research aims to explore whether language model hallucinations exhibit consistent patterns in internal activations before errors occur.
Investigated a custom transformer model (806K parameters) and GPT-2 (124M parameters) across 20,000 factual prompts categorized into 7 knowledge areas.
Identified two phenomena: Relation Dropout and Last-Layer Suppression, which correlate with hallucination occurrences.
Developed HallScan, an open-source tool, and HallBench, a benchmark with 20,000 annotated examples.
Relation Dropout observed before hallucination events in smaller models.
Correct factual information identified in blocks 10–11 of GPT-2, but overridden by block 12.
Established a three-type taxonomy for hallucinations, enhancing the understanding of model failures.

Abstract

When a language model confidently states that the capital of Germany is Paris, somethinghas gone wrong inside the model before that word ever appears. This paper investigateswhat. We ask whether hallucinations follow consistent, detectable patterns in the internalactivations of transformer models prior to the generation of an incorrect token, and findthat they do. Through experiments on a custom 806K-parameter transformer and GPT-2(124M parameters), tested across 20,000 factual prompts in 7 knowledge categories, weidentify two named phenomena. Relation Dropout: attention to the semantic relationtoken collapses in the final transformer block of small models before a hallucination occurs.Last-Layer Suppression: factual knowledge emerges correctly in blocks 10–11 of GPT-2but is systematically overridden by block 12. We propose a three-type hallucination taxonomy,release HallScan (pip install hallscan), an open-source detection tool, and HallBench,a labeled benchmark of 20,000 annotated examples.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Nikhil Upadhyay

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Hallucination Fingerprints: Consistent Failure Patterns in Large Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study