This paper presents three descriptive validation studies of a large language model (LLM) conversation agent designed to support families and carers with autistic children. In the first study, a LLM’s responses to 400 parent/carer questions were assessed against 10 criteria across 3 domains—safety, empathy and utility by experienced human evaluators. In the second study, the LLM’s capacity to identify safeguarding issues was evaluated. In the third study, the responses to 50 parent/carer questions from the LLM and health clinicians were blind rated by an experienced evaluator and compared. The LLM’s responses were rated as safe, empathetic and useful. The LLM identified and correctly flagged the safeguarding issue in 100% of the presented questions. The ratings for LLM and clinician’s responses were highly correlated and the evaluator was able to distinguish which the provenance of the responses (74%). This is the first deployment of a comprehensive evaluation model that uses human ratings to scrutinize the output of LLM designed to support families with autistic children. It provides a demonstration of how LLMs have the potential to be safe, empathetic and clinically useful tools for responding to the unmet support needs of parents and carers of autistic children.
Building similarity graph...
Analyzing shared references across papers
Loading...
Freddy Jackson Brown
Isabelle Stewart Muscat
Louise Quinn
Scientific Reports
University of Warwick
Building similarity graph...
Analyzing shared references across papers
Loading...
Brown et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69d895796c1944d70ce06876 — DOI: https://doi.org/10.1038/s41598-026-44254-5