Background: ChatGPT is a large language model (LLM) online chatbot developed by OpenAI and launched in November 2022. Early adoption studies have shown high readiness to use this technology for health-related questions and self-diagnosis. However, the quality and clinical adequacy of health-related responses remain incompletely characterized. This study aimed to explore responses generated by ChatGPT-3.5 and ChatGPT-4.0 to common patient questions regarding scoliosis. Methods: Ten scoliosis-related frequently asked questions (FAQs) were selected from a larger pool of over 250 patient-facing questions compiled from 17 publicly available FAQ webpages and informed by a Google Trends analysis. Questions were harmonized, grouped by theme, and then reduced by rule-based expert review to a final set intended to represent common patient concerns. Results: The median ratings of ChatGPT-3.5 and ChatGPT-4.0 responses ranged from satisfactory, requiring minimal (2) to moderate clarification (3). Across the ten matched questions, no statistically detectable difference was found between models in this study setting (W = 8.0, p = 0.59; Cliff’s δ = −0.12 95% CI −0.58, 0.40); however, given the small question set, unblinded rating process, and poor inter-rater reliability, this should not be interpreted as evidence of equivalence, non-inferiority, or comparable model performance. The results apply only to the 10–15 April 2024, online snapshots of ChatGPT-3.5 and ChatGPT-4.0 and should not be generalized to later model iterations. Conclusions: This study should be interpreted as a clinically oriented observational report, intended to inform physician awareness and patient-physician communication rather than validate chatbot accuracy or safety. In this 10–15 April 2024, sample, both model outputs frequently required clinician clarification. Given the small FAQ set, low inter-rater reliability, unblinded design, and single-sample outputs, the findings do not establish equivalence or superiority and apply only to the specific 10–15 April 2024, model snapshots and evaluated questions.
Building similarity graph...
Analyzing shared references across papers
Loading...
Vu-Han et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69d895206c1944d70ce062ae — DOI: https://doi.org/10.3390/jpm16040206
Tu-Lan Vu-Han
Enikő Regényi
Vikram Sunkara
Journal of Personalized Medicine
Cornell University
Charité - Universitätsmedizin Berlin
Hospital for Special Surgery
Building similarity graph...
Analyzing shared references across papers
Loading...