What type of study is this?

This is a Quantitative Study study.

October 3, 2025Open Access

DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning

Key Points

DiEP retains approximately 92% of performance with half the experts on Mixtral 8×7B and improves efficiency.
The proposed method outperforms existing pruning techniques by up to 7.1% on the MMLU dataset, enhancing model utility.
Non-uniform expert pruning addresses varying redundancy across layers, ensuring optimal outcome in model performance.
This approach transforms discrete search space into a continuous one, enabling effective gradient-based optimization.

Abstract

Despite the significant breakthrough of Mixture-of-Experts (MoE), the increasing scale of these MoE models presents huge memory and storage challenges. Existing MoE pruning methods, which involve reducing parameter size with a uniform sparsity across all layers, often lead to suboptimal outcomes and performance degradation due to varying expert redundancy in different MoE layers. To address this, we propose a non-uniform pruning strategy, dubbed Differentiable Expert Pruning (DiEP), which adaptively adjusts pruning rates at the layer level while jointly learning inter-layer importance, effectively capturing the varying redundancy across different MoE layers. By transforming the global discrete search space into a continuous one, our method handles exponentially growing non-uniform expert combinations, enabling adaptive gradient-based pruning. Extensive experiments on five advanced MoE models demonstrate the efficacy of our method across various NLP tasks. Notably, DiEP retains around 92\% of original performance on Mixtral 87B with only half the experts, outperforming other pruning methods by up to 7. 1\% on the challenging MMLU dataset.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Sikai Bai

H. J. Li

Jie Zhang

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider