What type of study is this?

This is a Experimental Study study.

October 13, 2025Open Access

FunBench: Benchmarking Fundus Reading Skills of MLLMs

Key Points

FunBench reveals significant deficiencies in fundus reading skills of MLLMs, especially in basic tasks.
Experiments showed that MLLMs struggled with laterality recognition, emphasizing evaluation gaps.
The benchmark features hierarchical task organization across four levels to assess MLLMs comprehensively.
There is a critical need for improved large language models and vision encoders for effective fundus image analysis.

Abstract

Multimodal Large Language Models (MLLMs) have shown significant potential in medical image analysis. However, their capabilities in interpreting fundus images, a critical skill for ophthalmology, remain under-evaluated. Existing benchmarks lack fine-grained task divisions and fail to provide modular analysis of its two key modules, i.e., large language model (LLM) and vision encoder (VE). This paper introduces FunBench, a novel visual question answering (VQA) benchmark designed to comprehensively evaluate MLLMs' fundus reading skills. FunBench features a hierarchical task organization across four levels (modality perception, anatomy perception, lesion analysis, and disease diagnosis). It also offers three targeted evaluation modes: linear-probe based VE evaluation, knowledge-prompted LLM evaluation, and holistic evaluation. Experiments on nine open-source MLLMs plus GPT-4o reveal significant deficiencies in fundus reading skills, particularly in basic tasks such as laterality recognition. The results highlight the limitations of current MLLMs and emphasize the need for domain-specific training and improved LLMs and VEs.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Wei et al. (Sun,) studied this question.

www.synapsesocial.com/papers/68ecc715d1cc7436f7d18c2d — DOI: https://doi.org/10.48550/arxiv.2503.00901

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Authors

Qijie Wei

Kui Qian

Xirong Li

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

FunBench: Benchmarking Fundus Reading Skills of MLLMs

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion