May 20, 2026

Quality of Responses Generated by Artificial Intelligence Chatbots for Frequently Asked Questions by Caregivers of Presurgical Nasoalveolar Molding Therapy

Key Points

Key points are not available for this paper at this time.

Abstract

Objective The study compared the quality of responses generated by three artificial intelligence chatbots (AICs) for frequently asked questions (FAQs) by caregivers of presurgical nasoalveolar molding (PNAM) therapy. Material and Methods Twenty-three FAQs on PNAM were posed to WhatsApp Meta AI (Llama 4), ChatGPT-4o, and Gemini 2.5 Flash, under the same conditions. Their responses were evaluated and compared for accuracy, completeness, reliability (Modified DISCERN Score), readability (Flesch-Kincaid readability ease FKRE score, simple measure of Gobbledygook SMOG index), and global quality score (GQS) by three Orthodontists. Results The responses from Gemini and ChatGPT were more accurate than Meta (medians of 5.67, 5.33, and 5, respectively; P < .001). While Gemini outperformed others in completeness (median of 3 vs 2.33, P < .001) and reliability (means of 3.41 ± 0.27, 3.13 ± 0.24, and 2.98 ± 0.58, P < .001), Meta's responses were more readable (mean FKRE of 43.2 ± 7.97, 39.9 ± 10.3, 36.7 ± 9.17, and SMOG of 9.99 ± 1.32, 11.24 ± 2, 12.46 ± 1.4). For global quality, Gemini fared best, followed by ChatGPT and Meta (median GQS of 4.67, 4.33, and 3.67, respectively; P < .001). Conclusions All the AICs performed well in terms of accuracy, moderately in completeness and reliability, and sub-optimally in readability. Meta AI showed comparatively lower accuracy and completeness, but better readability than the other two AICs. These highlight the potential use of AI Chatbots as adjunct tools for caregiver education on PNAM and the need to optimize the content before clinical use.

Mark Helpful

Bookmark

Relay