May 31, 2024Open Access

上下文学习如何从非结构化数据训练中出现：共现、位置信息及噪声结构的作用

Key Points

Key points are not available for this paper at this time.

Abstract

大型语言模型（LLMs），如transformers，具有令人印象深刻的上下文学习（ICL）能力；它们能够基于提示中的输入输出序列为新查询生成预测，而无需参数更新。尽管许多理论试图解释ICL，但它们通常侧重于类似ICL任务（如回归）的结构化训练数据。然而，实际上这些模型是在非监督方式下，基于与ICL任务相去甚远的非结构化文本数据训练的。为此，我们研究了ICL如何从非结构化数据的非监督训练中出现。关键观察是，ICL可以简单地通过使用经典语言模型如连续词袋模型（CBOW）对共现信息建模而产生，我们对此进行了理论证明和实证验证。此外，我们确立了位置信息和噪声结构对将ICL推广到未见数据的必要性。最后，我们展示了ICL失效的实例并提供了理论解释；这些表明LLMs识别某些任务的ICL能力对训练数据的结构可能敏感。

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Kevin Christian Wibisono

Yixin Wang

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

上下文学习如何从非结构化数据训练中出现：共现、位置信息及噪声结构的作用

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider