September 3, 2024Open Access

OLMoE: オープンMixture-of-Experts言語モデル

Key Points

Key points are not available for this paper at this time.

Abstract

我々は、スパースなMixture-of-Experts（MoE）を活用した完全にオープンで最先端の言語モデル、OLMoEを紹介します。OLMoE-1B-7Bは70億（B）パラメータを持ちますが、1入力トークンあたり使用するのは1Bのみです。5兆トークンで事前学習し、さらに適応させてOLMoE-1B-7B-Instructを作成しました。類似のアクティブパラメータ数を持つ全ての利用可能なモデルを上回り、Llama2-13B-ChatやDeepSeekMoE-16Bのようなより大きなモデルすら超越しています。我々のモデルのMoEトレーニングに関する様々な実験を提示し、高度な専門化を示す当モデルのルーティングを分析し、モデル重み、学習データ、コード、ログなどすべての要素をオープンソース化しています。

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Niklas Muennighoff

Luca Soldaini

Dirk Groeneveld

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

OLMoE: オープンMixture-of-Experts言語モデル

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider