We present a multi-model analysis of expert specialization patterns across three Mixture-of-Experts (MoE) language models: OLMoE-1B-7B (64 experts, top-8, 16 layers), Qwen1.5-MoE-A2.7B (60 experts, top-4, 24 layers), and DeepSeek-MoE-16B (64 experts, top-6, 27 layers). Contrary to the common assumption that MoE experts specialize by semantic topic, our analysis reveals that across all three architectures, experts primarily specialize by syntactic token type: content words, function words, punctuation, and capitalized tokens. The most topic-specialized expert achieves only 1.6–2.3× the uniform baseline. Expert selectivity follows a universal U-shaped curve across layers (high early, low middle, high late), and co-activation clusters undergo a reorganization phase in middle layers. These findings generalize across model sizes (7B–16B), expert counts (60–64), and routing strategies (top-4 to top-8). Package includes the paper, full analysis data for all 3 models (JSON), reproduction guide, and figures. This work is a companion study to SpectralAI (DOI: 10.5281/zenodo.19457288).
Building similarity graph...
Analyzing shared references across papers
Loading...
Jordi Silvestre Lopez (Tue,) studied this question.
www.synapsesocial.com/papers/69d894ce6c1944d70ce05b23 — DOI: https://doi.org/10.5281/zenodo.19457411
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Jordi Silvestre Lopez
Building similarity graph...
Analyzing shared references across papers
Loading...