August 28, 2024Open Access

Mixture-of-Expertsにおける補助損失不要の負荷分散戦略

Key Points

Key points are not available for this paper at this time.

Abstract

Mixture-of-Experts（MoE）モデルにおいて、不均衡な専門家の負荷は、ルーティングの崩壊や計算コストの増大を引き起こします。既存の手法では負荷の均衡を促進するために補助損失を用いることが一般的ですが、大きな補助損失はトレーニングに無視できない干渉勾配をもたらし、モデル性能を損なう可能性があります。負荷の均衡を制御しつつトレーニング中に望ましくない勾配を発生させないために、我々は補助損失を用いない負荷分散戦略であるLoss-Free Balancingを提案します。具体的には、top-Kルーティングの決定前に、Loss-Free Balancingは各専門家のルーティングスコアに専門家ごとのバイアスを適用します。最近の負荷に基づいて各専門家のバイアスを動的に更新することで、常に専門家の負荷分布を均衡に保つことが可能です。さらに、Loss-Free Balancingは干渉勾配を発生させないため、MoEトレーニングによって得られるモデル性能の上限も引き上げます。最大3Bパラメータ、最大200BトークンでトレーニングされたMoEモデルにおいて、Loss-Free Balancingの性能を検証しました。実験結果は、Loss-Free Balancingが従来の補助損失制御による負荷分散戦略よりも優れた性能とより良い負荷均衡を達成することを示しています。

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Lean Wang

Huazuo Gao

Chenggang Zhao

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Mixture-of-Expertsにおける補助損失不要の負荷分散戦略

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider