October 8, 2025Open Access

MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems

Key Points

MMSciBench found that top models only reached 63.77% accuracy in reasoning tasks, indicating significant shortcomings.
Even leading models struggled notably with visual reasoning, pointing to urgent needs for improvement in multimodal tasks.
This analysis establishes MMSciBench as a vital benchmark for advancing research in multimodal scientific reasoning.
With open-source code and a comprehensive dataset, MMSciBench aims to promote enhanced evaluation methodologies in the field.

Abstract

Recent advances in large language models (LLMs) and vision-language models (LVLMs) have shown promise across many tasks, yet their scientific reasoning capabilities remain untested, particularly in multimodal settings. We present MMSciBench, a benchmark for evaluating mathematical and physical reasoning through text-only and text-image formats, with human-annotated difficulty levels, solutions with detailed explanations, and taxonomic mappings. Evaluation of state-of-the-art models reveals significant limitations, with even the best model achieving only 63. 77\% accuracy and particularly struggling with visual reasoning tasks. Our analysis exposes critical gaps in complex reasoning and visual-textual integration, establishing MMSciBench as a rigorous standard for measuring progress in multimodal scientific understanding. The code for MMSciBench is open-sourced at GitHub, and the dataset is available at Hugging Face.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Ye et al. (Thu,) studied this question.

www.synapsesocial.com/papers/68e6a0f4718ef0a556b33d66 — DOI: https://doi.org/10.48550/arxiv.2503.01891

Authors

Xinwu Ye

Chengfan Li

Siming Chen

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion