GPT-o3 and Gemini-3-Flash achieve superior stability and accuracy in ophthalmology Question Answering (QA), making them suitable for high-stakes clinical decision support. The open-source model DeepSeek-R1 shows competitive potential, especially in complex tasks. Notably, GPT-5 failed to surpass its predecessor in both accuracy and consistency in this specialized domain. Prompt engineering has a limited impact on performance for closed-ended medical questions. Future work should extend to multimodal integration and real-world clinical validation to enhance the practical utility and reliability of LLMs in medicine.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69a75cdec6e9836116a261ac — DOI: https://doi.org/10.3389/fcell.2026.1744389
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Ping Zhang
Jiaoman Wang
Xinya Hu
Frontiers in Cell and Developmental Biology
Wenzhou Medical University
Shenzhen Second People's Hospital
Affiliated Eye Hospital of Wenzhou Medical College
Building similarity graph...
Analyzing shared references across papers
Loading...