What question did this study set out to answer?

The research aims to explore how noise in self-attention contributes to rank collapse in large language models.

March 31, 2026Open Access

Noise Accumulation and Rank Collapse in Dense Self-Attention: A Theoretical Framework

Key Points

The research aims to explore how noise in self-attention contributes to rank collapse in large language models.
Analysis of self-attention mechanism in Transformer decoders
Theoretical framework development
Proposition of Dynamic Sparse Attention with Landmark Tokens
Demonstrated that noise accumulates, corrupting token representations
Confirmed the link between noise and rank collapse in self-attention
Proposed a method (DSALT) that preserves long-range dependencies while reducing noise

Abstract

Large language models based on the Transformer decoder architecture perform multi-head self-attention over all previous tokens in the context window. We argue that this dense attention mechanism introduces a systematic form of noise: every token, regardless of semantic relevance, contributes a strictly positive weight to every other token's representation via the softmax operation. This noise accumulates across attention heads and layers, progressively corrupting token representations. We show that this accumulation accelerates the rank collapse phenomenon established by Dong et al. in which self-attention networks converge doubly exponentially to a rank-1 matrix with depth. We conjecture that this mechanism is a structural cause of hallucinations in large language models, consistent with empirical evidence on long-context degradation. To address this, we propose Dynamic Sparse Attention with Landmark Tokens (DSALT), a mechanism that replaces dense attention with an adaptive local window augmented by a small set of globally informative tokens, reducing noise at its source while preserving essential long-range dependencies, with implications for both model reliability and computational efficiency at scale.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Leonardo Cofone (Sun,) studied this question.

www.synapsesocial.com/papers/69cb6589e6a8c024954b98c4 — DOI: https://doi.org/10.5281/zenodo.19312826

Noise Accumulation and Rank Collapse in Dense Self-Attention: A Theoretical Framework

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion