Key points are not available for this paper at this time.
Artificial intelligence (AI) has shown potential for enhancing medical practice and improving patient outcomes. However, the efficacy and linguistic accessibility of Large Language Models(LLMs) in pediatric asthma management remain underexplored. This study evaluated the performance of four LLMs in generating clinical information within this domains. We administrated 15 guideline-based pediatric asthma inquiries to hatGPT-4o, Claude 3 Opus, Gemini 2.0, and DeepSeek. Anonymized responses were independently evaluated by three board-certified pediatric pulmonologists using DISCERN instrument (score range 16–80). Readability was assessed using six standard indices. Inter-rater reliability was measured with intraclass correlation coefficients (ICC). Statistical analysis included repeated measures and post-hoc comparisons with effect size reporting. No significant difference was found in the overall quality of health information (DISCERN scores) among the four LLMs (F(3,56) = 0.144, p =.933, η² =0.008), with all mean scores clustered within a narrow “fair-to-good” range (50.3–51.9). However, significant differences were observed in readability: ChatGPT-4o generated significantly more comprehensible text than DeepSeek (FRE mean difference = 12.41, p =.005, Cohen’s d = 1.28), while DeepSeek performed significantly worse than all other models (all p <.05). Inter-rater reliability was high (ICC range: 0.849–0.901, all p <.001). Critically, the mean readability level of all outputs (FKGL: 13.2–14.9) far exceeded the recommended reading accessibility level for patient materials. While current LLMs can provide generally accurate information on pediatric asthma, their outputs exhibit significant limitations in readability for patient-facing use. ChatGPT‑4o shows relative advantages in comprehensibility, yet none meet recommended health-literacy standards. These findings underscore that AI should serve as a supplementary decision‑support tool under clinician supervision, not as a substitute for professional medical advice. Future work should prioritize the integration of adaptive text‑simplification features, validate AI‑generated content in real‑world clinical and caregiver settings, and expand evaluations to include emerging models and diverse chronic disease contexts.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ying-Qi Hang
jie wu
Li Bai
BMC Medical Informatics and Decision Making
Shanghai University of Traditional Chinese Medicine
Shaanxi University of Chinese Medicine
Shanghai Traditional Chinese Medicine Hospital
Building similarity graph...
Analyzing shared references across papers
Loading...
Hang et al. (Tue,) studied this question.
www.synapsesocial.com/papers/6a08efbf3589fa5d64d60c0a — DOI: https://doi.org/10.1186/s12911-026-03371-x