June 24, 2024Open Access

LLaMA-MoE：基于LLaMA构建专家混合模型及持续预训练

Key Points

Key points are not available for this paper at this time.

Abstract

专家混合模型（MoE）作为一个有前景的框架，在扩展大型语言模型（LLMs）方面越来越受关注。然而，从零开始在大规模环境中训练MoE仍面临数据需求大和不稳定性问题。受到这一限制的启发，我们研究了如何从已有的密集型大型语言模型构建MoE模型。具体来说，基于知名的LLaMA-2 7B模型，我们通过以下步骤获得MoE模型：（1）专家构建，将原始前馈神经网络（FFNs）的参数划分为多个专家；（2）持续预训练，进一步训练转换后的MoE模型及额外的门控网络。在本文中，我们全面探索了不同的专家构建方法以及持续预训练的各种数据采样策略。经过这些阶段，我们的LLaMA-MoE模型能够保持语言能力，并在部分参数激活的情况下，将输入标记路由至特定专家。经验数据显示，通过训练2000亿标记，LLaMA-MoE-3.5B模型显著优于包含相似激活参数的密集模型。源码和模型可在https://github.com/pjlab-sys4nlp/llama-moe获取。

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Tong Zhu

Xiaoye Qu

Daize Dong

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

LLaMA-MoE：基于LLaMA构建专家混合模型及持续预训练

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider