In-context learning (ICL) is a key building block of modern large language models, yet its theoretical mechanisms remain poorly understood. It is particularly mysterious how ICL operates in real-world applications where tasks have a common structure. In this work, we address this problem by analyzing a linear attention model trained on low-rank regression tasks. Within this setting, we precisely characterize the distribution of predictions and the generalization error in the high-dimensional limit. Moreover, we find that statistical fluctuations in finite pre-training data induce an implicit regularization. Finally, we identify a sharp phase transition of the generalization error governed by task structure. These results provide a framework for understanding how transformers learn to learn the task structure.
Building similarity graph...
Analyzing shared references across papers
Loading...
Kaito Takanami
T. Takahashi
Yoshiyuki Kabashima
Building similarity graph...
Analyzing shared references across papers
Loading...
Takanami et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e861857ef2f04ca37e39c2 — DOI: https://doi.org/10.48550/arxiv.2510.04548