March 3, 2026Open Access

Comparative performance of GPT-4, GPT-o3, GPT-5, Gemini-3-Flash, and DeepSeek-R1 in ophthalmology question answering

Key Points

Ophthalmology question answering reveals superior performance from GPT-o3 and Gemini-3-Flash in clinical decision support.
Notably, GPT-5 did not exceed its predecessor's accuracy or stability for medical questions.
Assessment using large language models demonstrates prompt engineering's limited effect on closed-ended queries.
Future research should focus on multimodal integration and validation in actual healthcare settings.

Abstract

GPT-o3 and Gemini-3-Flash achieve superior stability and accuracy in ophthalmology Question Answering (QA), making them suitable for high-stakes clinical decision support. The open-source model DeepSeek-R1 shows competitive potential, especially in complex tasks. Notably, GPT-5 failed to surpass its predecessor in both accuracy and consistency in this specialized domain. Prompt engineering has a limited impact on performance for closed-ended medical questions. Future work should extend to multimodal integration and real-world clinical validation to enhance the practical utility and reliability of LLMs in medicine.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Zhang et al. (Thu,) studied this question.

www.synapsesocial.com/papers/69a75cdec6e9836116a261ac — DOI: https://doi.org/10.3389/fcell.2026.1744389

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Large language models encode clinical knowledge· 2023 · 3,023 citations
GPT-4 and Ophthalmology Operative Notes· 2023 · 49 citations
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing· 2022 · 3,547 citations
GsMTx4-blocked PIEZO1 channel promotes myogenic differentiation and alleviates myofiber damage in Duchenne muscular dystrophy· 2025 · 7 citations
Assessing the possibility of using large language models in ocular surface diseases

Authors

Ping Zhang

Jiaoman Wang

Xinya Hu

Journals

Frontiers in Cell and Developmental Biology

Actions

Institutions

Wenzhou Medical University

Shenzhen Second People's Hospital

Affiliated Eye Hospital of Wenzhou Medical College

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Comparative performance of GPT-4, GPT-o3, GPT-5, Gemini-3-Flash, and DeepSeek-R1 in ophthalmology question answering

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion