Introduction: Large language models (LLMs) like ChatGPT have the potential to improve patient education. Their role in pediatric plastic surgery counseling remains underexplored. This study evaluated ChatGPT-4o’s responses to common parent questions across 4 pediatric craniofacial procedures using 5 metrics: DISCERN, specificity, Flesch-Kincaid Grade Level (FKGL), emotion scoring, and Patient Education Materials Assessment (PEMAT). Methods: Twelve standardized vignettes were developed for cleft lip and palate, craniosynostosis, facial trauma from a dog bite, and otoplasty. Each case featured prompts on surgical risks, recovery, and procedure-specific concerns. All were submitted on the same day using the same ChatGPT-4o profile. DISCERN scores were rated by 2 board-certified plastic surgeons. Specificity and emotion were rated on a 5-point Likert scale by 2 medical students. Readability was calculated with FKGL. PEMAT was used to assess understandability and actionability. Results: Mean DISCERN score was 43.7/75 (reliability 23.8/40, treatment quality 20.3/35). Mean specificity ranged from 1.7 (craniosynostosis) to 3.0 (otoplasty and dog bite). Average FKGL was 9.5 (10th-grade level). Mean emotion score was 3.1. PEMAT scores averaged 62% for understandability and 27% for actionability. Facial trauma demonstrated the highest in both domains. Conclusions: ChatGPT-4o produced organized, accessible responses, but underperformed in reliability, quality, specificity, and actionability. Reading level exceeds recommended patient education standards of sixth to eighth grade. Emotional tone was moderate but not consistently tailored to sensitive pediatric contexts. These findings suggest ChatGPT is insufficient for unsupervised use. With refinement, LLMs may serve as support, but not replace, physician-led counseling in pediatric craniofacial surgery.
Miller et al. (Wed,) studied this question.