This study evaluated the accuracy, readability, and comprehensiveness of patient-facing responses generated by LLM-based chatbot platforms to pediatric contact lens (CL)–related questions, using expert grading and readability benchmarking. Five platforms (ChatGPT-4o, Gemini 1.5, Perplexity, Copilot, and Claude 3.5 Sonnet) were assessed using 28 curated questions. Two pediatric ophthalmologists graded anonymized outputs using DISCERN and PEMAT-P, 5-point Likert scales for accuracy and comprehensiveness, and multiple automated readability indices. Expert-written responses were included only for readability benchmarking. ChatGPT-4o produced the longest responses (p0.0001). Accuracy and comprehensiveness differed across platforms (p=0.0216 and p=0.0067), with ChatGPT-4o scoring higher than Perplexity in post-hoc comparisons (p=0.0173 and p=0.0087). Expert responses were shorter but showed higher complexity on readability indices. Accuracy-based reproducibility was high for general pediatric CL queries but lower for aphakic CL–related questions (p=0.041), and factual inaccuracies were more frequent in aphakic topics. While LLMs may support patient education, variability in correctness and completeness underscores the need for expert oversight; these tools should complement, not replace, clinical expertise in pediatric CL usage.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mehmet Ömer Kırıştıoğlu
Meral Yıldız
Sevde Isleker
Uludağ Üniversitesi Tıp Fakültesi Dergisi
Bursa Uludağ Üni̇versi̇tesi̇
Bursa Technical University
Türksat (Turkey)
Building similarity graph...
Analyzing shared references across papers
Loading...
Kırıştıoğlu et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69ba432b4e9516ffd37a41fe — DOI: https://doi.org/10.32708/uutfd.1780297