What type of study is this?

This is a Experimental Study study.

October 8, 2025Open Access

MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning

Key Points

MMedAgent-RL demonstrates a 20.7% performance increase over traditional supervised fine-tuning techniques.
The model employs reinforcement learning to facilitate dynamic collaboration among general practitioners and specialists, indicating improved diagnostic accuracy.
Curriculum learning is utilized to train attending physicians, allowing them to balance specialist imitation and correction effectively.
Experiments across five medical VQA benchmarks validate the model's human-like reasoning capabilities.

Abstract

Medical Large Vision-Language Models (Med-LVLMs) have shown strong potential in multimodal diagnostic tasks. However, existing single-agent models struggle to generalize across diverse medical specialties, limiting their performance. Recent efforts introduce multi-agent collaboration frameworks inspired by clinical workflows, where general practitioners (GPs) and specialists interact in a fixed sequence. Despite improvements, these static pipelines lack flexibility and adaptability in reasoning. To address this, we propose MMedAgent-RL, a reinforcement learning (RL)-based multi-agent framework that enables dynamic, optimized collaboration among medical agents. Specifically, we train two GP agents based on Qwen2.5-VL via RL: the triage doctor learns to assign patients to appropriate specialties, while the attending physician integrates the judgments from multi-specialists and its own knowledge to make final decisions. To address the inconsistency in specialist outputs, we introduce a curriculum learning (CL)-guided RL strategy that progressively teaches the attending physician to balance between imitating specialists and correcting their mistakes. Experiments on five medical VQA benchmarks demonstrate that MMedAgent-RL not only outperforms both open-source and proprietary Med-LVLMs, but also exhibits human-like reasoning patterns. Notably, it achieves an average performance gain of 20.7% over supervised fine-tuning baselines.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Peng Xia

Jinglu Wang

Yibo Peng

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider