What type of study is this?

This is a Experimental Study study.

September 28, 2025Open Access

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Key Points

LMM-R1 improves reasoning in multimodal domains by adapting rule-based reinforcement learning strategies.
Experiments show a 4.83% improvement over baseline in multimodal benchmarks, validating the effectiveness of the approach.
The two-stage framework enhances reasoning abilities using text data before generalizing these capabilities across multimodal scenarios.
This approach addresses data limitations germane to multimodal reasoning, potentially reducing the need for extensive multimodal training data.

Abstract

Enhancing reasoning in Large Multimodal Models (LMMs) faces unique challenges from the complex interplay between visual perception and logical reasoning, particularly in compact 3B-parameter architectures where architectural constraints limit reasoning capacity and modality alignment. While rule-based reinforcement learning (RL) excels in text-only domains, its multimodal extension confronts two critical barriers: (1) data limitations due to ambiguous answers and scarce complex reasoning examples, and (2) degraded foundational reasoning induced by multimodal pretraining. To address these challenges, we propose LMM-R1, a two-stage framework adapting rule-based RL for multimodal reasoning through Foundational Reasoning Enhancement (FRE) followed by Multimodal Generalization Training (MGT). The FRE stage first strengthens reasoning abilities using text-only data with rule-based RL, then the MGT stage generalizes these reasoning capabilities to multimodal domains. Experiments on Qwen2. 5-VL-Instruct-3B demonstrate that LMM-R1 achieves 4. 83\% and 4. 5\% average improvements over baselines in multimodal and text-only benchmarks, respectively, with a 3. 63\% gain in complex Football Game tasks. These results validate that text-based reasoning enhancement enables effective multimodal generalization, offering a data-efficient paradigm that bypasses costly high-quality multimodal training data.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yingzhe Peng

Gongrui Zhang

Miaosen Zhang

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider