April 7, 2024Open Access

SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

The advancement of deep learning has led to the emergence of Mixture-of-Experts (MoEs) models, known for their dynamic allocation of computational resources based on input. Despite their promise, MoEs face challenges, particularly in terms of memory requirements. To address this, our work introduces SEER-MoE, a novel two-stage framework for reducing both the memory footprint and compute requirements of pre-trained MoE models. The first stage involves pruning the total number of experts using a heavy-hitters counting guidance, while the second stage employs a regularization-based fine-tuning strategy to recover accuracy loss and reduce the number of activated experts during inference. Our empirical studies demonstrate the effectiveness of our method, resulting in a sparse MoEs model optimized for inference efficiency with minimal accuracy trade-offs.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Muzio et al. (Sun,) studied this question.

www.synapsesocial.com/papers/68e701fab6db64358767c0eb — DOI: https://doi.org/10.48550/arxiv.2404.05089

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference· 2024 · 13 citations
Efficiently Editing Mixture-of-Experts Models with Compressed Experts· 2025
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs· 2024 · 2 citations
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts· 2024
Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study· 2024 · 2 citations

Authors

Alexandre Muzio

Alexander Y. Sun

Churan He

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion