February 19, 2024Open Access

Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

Key Points

Key points are not available for this paper at this time.

Abstract

The Mixture of Experts (MoE) paradigm provides a powerful way to decompose inscrutable dense layers into smaller, modular computations often more amenable to human interpretation, debugging, and editability. A major problem however lies in the computational cost of scaling the number of experts to achieve sufficiently fine-grained specialization. In this paper, we propose the Multilinear Mixutre of Experts (MMoE) layer to address this, focusing on vision models. MMoE layers perform an implicit computation on prohibitively large weight tensors entirely in factorized form. Consequently, MMoEs both (1) avoid the issues incurred through the discrete expert routing in the popular 'sparse' MoE models, yet (2) do not incur the restrictively high inference-time costs of 'soft' MoE alternatives. We present both qualitative and quantitative evidence (through visualization and counterfactual interventions respectively) that scaling MMoE layers when fine-tuning foundation models for vision tasks leads to more specialized experts at the class-level whilst remaining competitive with the performance of parameter-matched linear layer counterparts. Finally, we show that learned expert specialism further facilitates manual correction of demographic bias in CelebA attribute classification. Our MMoE model code is available at https://github.com/james-oldfield/MMoE.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Oldfield et al. (Mon,) studied this question.

www.synapsesocial.com/papers/68e78a54b6db6435876fc2ce — DOI: https://doi.org/10.48550/arxiv.2402.12550

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

MoLAE: Mixture of Latent Experts for Parameter-Efficient Language Models· 2025
The Intersection of Modular Architectures and Scalable AI Systems· 2025
A Closer Look into Mixture-of-Experts in Large Language Models· 2024 · 4 citations
Multi-Head Mixture-of-Experts· 2024 · 5 citations
AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models· 2025

Authors

James Oldfield

Markos Georgopoulos

Grigorios G. Chrysos

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion