What question did this study set out to answer?

This study aims to evaluate the accuracy and completeness of AI chatbots in delivering information on central auditory processing disorder (CAPD).

April 18, 2026

Expert Evaluation of Artificial Intelligence Chatbots for Central Auditory Processing Disorder Information

Puntos clave

This study aims to evaluate the accuracy and completeness of AI chatbots in delivering information on central auditory processing disorder (CAPD).
Evaluated three AI chatbots (ChatGPT, Gemini, Claude) on 44 questions about CAPD.
Questions were categorized into four difficulty levels: patient level, easy, intermediate, and specialized.
Seven clinical experts rated responses for accuracy and completeness using a 1–5 Likert scale.
Data analyzed through analyses of variance, correlations, and interrater comparisons.
Mean accuracy of chatbot responses was below 4.0, with completeness around 3.5.
Complex questions typically scored below 3.0, showing lower performance with increased difficulty.
Only three out of 44 questions received high ratings (≥ 4) for both accuracy and completeness across all chatbots.

Resumen

Purpose: Artificial intelligence (AI) chatbots based on large language models (LLMs) can deliver medical information, but their performance on specialized topics such as central auditory processing disorder (CAPD) remains unexplored. This study evaluated the accuracy and completeness of three AI chatbots (ChatGPT, Gemini, and Claude) in providing CAPD-related information across varying levels of question complexity. Method: Forty-four questions, categorized into four difficulty levels (patient level, easy, intermediate, and specialized; n = 11 each), were submitted to each chatbot, generating 132 responses. Seven clinical experts, blinded to chatbot identity, independently rated accuracy and completeness on a 1–5 Likert scale. Data were analyzed with analyses of variance, correlations, and interrater comparisons. Results: Chatbot performance was similar, with mean accuracy below 4.0 and completeness about 3.5. Complex questions often scored below 3.0 across experts. Only three of the 44 questions, primarily patient level or relatively simple, received consistently high expert ratings (≥ 4 for both accuracy and completeness) across all three chatbots. Performance declined with question difficulty, although differences were not statistically significant. Accuracy and completeness were correlated across chatbots. Conclusions: Current AI chatbots provided generally accurate CAPD information but fell short of clinical standards, particularly on specialized questions. Their limited performance underscores the need for clinician oversight in CAPD assessment and management. Chatbots may serve as helpful adjuncts but should not replace expert evaluation and guidance in clinical settings. Supplemental Material: https://doi.org/10.23641/asha.31975101

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Davidson et al. (Thu,) studied this question.

www.synapsesocial.com/papers/69e3205140886becb653f666 — DOI: https://doi.org/10.1044/2026_aja-25-00224

Authors

Alyssa J. Davidson

W. Wiktor Jedrzejczak

Jennifer McCullagh

Journals

American Journal of Audiology

Actions

Institutions

University of Arizona

Aristotle University of Thessaloniki

Walter Reed National Military Medical Center

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Expert Evaluation of Artificial Intelligence Chatbots for Central Auditory Processing Disorder Information

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion