May 23, 2024Open Access

LocMoE+：具备令牌特征感知的增强路由器，实现高效的大型语言模型预训练

Key Points

Key points are not available for this paper at this time.

Abstract

专家混合（MoE）架构由于其显著降低训练和推理开销的能力，近年来在大型语言模型（LLMs）领域中越来越受关注。然而，MoE架构面临诸多挑战，如分配给各专家的令牌数量存在显著差异，以及专家之间易趋同化，削弱了模型的语义生成能力。本文提出了LocMoE+，这是低开销LocMoE的改进版本，包含以下增强措施：（1）量化并定义专家与令牌之间的亲和度。（2）实施全局自适应路由策略，根据亲和度评分重新排列令牌。（3）重新估计专家容量的下限，结果表明该下限会随着令牌特征分布的演变而逐渐降低。实验结果表明，在不影响模型收敛或效果的前提下，每个专家处理的令牌数可减少超过60%。结合通信优化，训练效率平均提升5.4%至46.6%。微调后，LocMoE+在GDAD、C-Eval和TeleQnA数据集上表现提升9.7%至14.1%。

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

李等人（星期四，）研究了这一问题。

www.synapsesocial.com/papers/68e68d03b6db643587615001 — DOI: https://doi.org/10.48550/arxiv.2406.00023

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

DANCE: Resource-Efficient Neural Architecture Search with Data-Aware and Continuous Adaptation· 2024 · 4 citations
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models· 2024 · 1 citations
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference· 2024 · 13 citations
A Closer Look into Mixture-of-Experts in Large Language Models· 2024 · 4 citations
Layerwise Recurrent Router for Mixture-of-Experts

Authors

Jing Li

Zhijie Sun

Dachao Lin

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

LocMoE+：具备令牌特征感知的增强路由器，实现高效的大型语言模型预训练

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion