What type of study is this?

This is a Quantitative Study study.

October 3, 2025Open Access

DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning

Key Points

DiEP retains approximately 92% of performance with half the experts on Mixtral 8×7B and improves efficiency.
The proposed method outperforms existing pruning techniques by up to 7.1% on the MMLU dataset, enhancing model utility.
Non-uniform expert pruning addresses varying redundancy across layers, ensuring optimal outcome in model performance.
This approach transforms discrete search space into a continuous one, enabling effective gradient-based optimization.

Abstract

Despite the significant breakthrough of Mixture-of-Experts (MoE), the increasing scale of these MoE models presents huge memory and storage challenges. Existing MoE pruning methods, which involve reducing parameter size with a uniform sparsity across all layers, often lead to suboptimal outcomes and performance degradation due to varying expert redundancy in different MoE layers. To address this, we propose a non-uniform pruning strategy, dubbed Differentiable Expert Pruning (DiEP), which adaptively adjusts pruning rates at the layer level while jointly learning inter-layer importance, effectively capturing the varying redundancy across different MoE layers. By transforming the global discrete search space into a continuous one, our method handles exponentially growing non-uniform expert combinations, enabling adaptive gradient-based pruning. Extensive experiments on five advanced MoE models demonstrate the efficacy of our method across various NLP tasks. Notably, DiEP retains around 92\% of original performance on Mixtral 87B with only half the experts, outperforming other pruning methods by up to 7. 1\% on the challenging MMLU dataset.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Bai et al. (Fri,) studied this question.

www.synapsesocial.com/papers/68e040eda99c246f578b3452 — DOI: https://doi.org/10.48550/arxiv.2509.16105

Authors

Sikai Bai

H. J. Li

Jie Zhang

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion