May 7, 2024Open Access

DeepSeek-V2：強力で経済的かつ効率的なMixture-of-Experts言語モデル

Key Points

Key points are not available for this paper at this time.

Abstract

我々は、経済的なトレーニングと効率的な推論を特徴とする強力なMixture-of-Experts（MoE）言語モデル、DeepSeek-V2を発表する。合計236Bのパラメータから構成され、各トークンに対して21Bが活性化され、128Kトークンのコンテキスト長をサポートする。DeepSeek-V2は、Multi-head Latent Attention（MLA）やDeepSeekMoEなどの革新的なアーキテクチャを採用している。MLAはKey-Value（KV）キャッシュを潜在ベクトルへ大幅に圧縮することで効率的な推論を保証し、DeepSeekMoEはスパース計算を通じて経済的なコストで強力なモデルをトレーニング可能にする。DeepSeek 67Bと比較して、DeepSeek-V2は著しく優れた性能を達成しつつ、トレーニングコストを42.5％削減し、KVキャッシュを93.3％減らし、最大生成スループットを5.76倍に向上させた。8.1Tトークンの高品質かつ多様なコーパスで事前学習を行い、さらに教師あり微調整（SFT）と強化学習（RL）を実施してその潜在能力を最大限に引き出している。評価結果は、活性化パラメータが21Bのみでも、DeepSeek-V2およびそのチャット版がオープンソースモデルの中でトップクラスの性能を示すことを示している。

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

DeepSeek-AI

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

DeepSeek-V2：強力で経済的かつ効率的なMixture-of-Experts言語モデル

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider