May 29, 2024Open Access

Why Larger Language Models Do In-context Learning Differently?

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Large language models (LLM) have emerged as a powerful tool for AI, with the key ability of in-context learning (ICL), where they can perform well on unseen tasks based on a brief series of task examples without necessitating any adjustments to the model parameters. One recent interesting mysterious observation is that models of different scales may have different ICL behaviors: larger models tend to be more sensitive to noise in the test context. This work studies this observation theoretically aiming to improve the understanding of LLM and ICL. We analyze two stylized settings: (1) linear regression with one-layer single-head linear transformers and (2) parity classification with two-layer multiple attention heads transformers (non-linear data and non-linear model). In both settings, we give closed-form optimal solutions and find that smaller models emphasize important hidden features while larger ones cover more hidden features; thus, smaller models are more robust to noise while larger ones are more easily distracted, leading to different ICL behaviors. This sheds light on where transformers pay attention to and how that affects ICL. Preliminary experimental results on large base and chat models provide positive support for our analysis.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Shi et al. (Wed,) studied this question.

www.synapsesocial.com/papers/68e67e28b6db643587608195 — DOI: https://doi.org/10.48550/arxiv.2405.19592

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Authors

Zhenmei Shi

Junyi Wei

Zhuoyan Xu

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Why Larger Language Models Do In-context Learning Differently?

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion