What type of study is this?

This is a Quantitative Study study.

October 2, 2025Open Access

MuBench: Assessment of Multilingual Capabilities of Large Language Models Across 61 Languages

Key Points

Notable gaps between claimed and actual language coverage are found in multilingual LLMs, especially for low-resource languages.
The introduction of MuBench provides a more comprehensive evaluation across 61 languages, addressing previous dataset limitations.
Evaluations indicate persistent performance disparity between English and low-resource languages, emphasizing the need for focused improvements.
The proposal of Multilingual Consistency offers a new metric to identify performance bottlenecks in multilingual LLMs.

Abstract

Multilingual large language models (LLMs) are advancing rapidly, with new models frequently claiming support for an increasing number of languages. However, existing evaluation datasets are limited and lack cross-lingual alignment, leaving assessments of multilingual capabilities fragmented in both language and skill coverage. To address this, we introduce MuBench, a benchmark covering 61 languages and evaluating a broad range of capabilities. We evaluate several state-of-the-art multilingual LLMs and find notable gaps between claimed and actual language coverage, particularly a persistent performance disparity between English and low-resource languages. Leveraging MuBench's alignment, we propose Multilingual Consistency (MLC) as a complementary metric to accuracy for analyzing performance bottlenecks and guiding model improvement. Finally, we pretrain a suite of 1.2B-parameter models on English and Chinese with 500B tokens, varying language ratios and parallel data proportions to investigate cross-lingual transfer dynamics.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Wenhan Han

Yifan Zhang

Zhixun Chen

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

MuBench: Assessment of Multilingual Capabilities of Large Language Models Across 61 Languages

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider