October 27, 2024Open Access

AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference

Key Points

Key points are not available for this paper at this time.

Abstract

Mixture-of-Experts (MoE) models are designed to enhance the efficiency of large language models (LLMs) without proportionally increasing the computational demands. However, their deployment on edge devices still faces significant challenges due to high on-demand loading overheads from managing sparsely activated experts. This paper introduces AdapMoE, an algorithm-system co-design framework for efficient MoE inference. AdapMoE features adaptive expert gating and management to reduce the on-demand loading overheads. We observe the heterogeneity of experts loading across layers and tokens, based on which we propose a sensitivity-based strategy to adjust the number of activated experts dynamically. Meanwhile, we also integrate advanced prefetching and cache management techniques to further reduce the loading latency. Through comprehensive evaluations on various platforms, we demonstrate AdapMoE consistently outperforms existing techniques, reducing the average number of activated experts by 25% and achieving a 1.35x speedup without accuracy degradation. Code is available at: https://github.com/PKU-SEC-Lab/AdapMoE.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Zhong et al. (Sun,) studied this question.

www.synapsesocial.com/papers/68e55b65e2b3180350ef90ad — DOI: https://doi.org/10.1145/3676536.3676741

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Switch Transformers: Scaling to Trillion Parameter Models with Simple\n and Efficient Sparsity· 2021 · 702 citations
ZeRO-infinity· 2021 · 211 citations
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity· 2021 · 361 citations
2017 USENIX Annual Technical Conference (USENIX ATC'17)· 2017 · 3,093 citations
Mixture of experts: a literature survey· 2012 · 519 citations

Authors

Shuzhang Zhong

Ling Liang

Yuan Wang

Actions

Institutions

Peking University

Beijing Advanced Sciences and Innovation Center

Beijing Academy of Artificial Intelligence

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion