September 3, 2024Open Access

OLMoE: Open Mixture-of-Experts Language Models

Key Points

Key points are not available for this paper at this time.

Abstract

We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to create OLMoE-1B-7B-Instruct. Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B. We present various experiments on MoE training, analyze routing in our model showing high specialization, and open-source all aspects of our work: model weights, training data, code, and logs.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Niklas Muennighoff

Luca Soldaini

Dirk Groeneveld

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

OLMoE: Open Mixture-of-Experts Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider