< 0.05) but not for Safety. Compared with ChatGPT, lower odds of higher ratings were seen for Grok (OR 0.48) and DeepSeek (OR 0.61). Inter-rater reliability indicated moderate agreement (Fleiss' κ = 0.59) and strong consensus (Gwet's AC1 = 0.87).ConclusionChatGPT showed superior accuracy and clarity, while Gemini and Llama excelled in educational value and safety. High expert agreement supports AI chatbots as adjuncts in pediatric ophthalmology education requiring continued validation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Dhiman Shweta
Dutta Paromita
Thacker Prolima
European Journal of Ophthalmology
Post Graduate Institute of Medical Education and Research
Maulana Azad Medical College
Central Rice Research Institute
Building similarity graph...
Analyzing shared references across papers
Loading...
Shweta et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69fd7ef7bfa21ec5bbf07406 — DOI: https://doi.org/10.1177/11206721261445529