Abstract As artificial intelligence systems become increasingly embedded in socially sensitive contexts, a central question arises: can they replicate complex forms of human social cognition? To investigate this question, we developed and validated a novel full-face cognitive empathy task designed to probe nuanced dimensions such as moral judgement, intention attribution and interpersonal trust. The task was administered to 230 human participants and five leading artificial intelligence models (ChatGPT-4o, Claude, Gemini, Grok and Mistral). Hierarchical clustering based on Jaccard distance revealed that ChatGPT-4o, Grok and Gemini formed a cohesive cluster closely aligned with responses observed in the human sample, while Claude diverged and Mistral showed partial overlap. Fisher’s exact tests confirmed that the ChatGPT–Grok–Gemini cluster differed minimally from humans across all dimensions. These findings demonstrate that general-purpose artificial intelligence systems can now functionally simulate nuanced dimensions of cognitive empathy inference, as reflected in their alignment with the response pattern observed in the human participant of this study, with surprising fidelity. This opens the door to real-world applications such as social cognitive virtual assistants, diagnostic tools in mental health, conflict resolution systems, socially aware robots and adaptive educational platforms. However, the observed variability between models cautions against assuming uniform performance. Our paradigm provides a rigorous benchmark for evaluating social cognition in artificial intelligence and supports its responsible deployment in socially complex environments.
Building similarity graph...
Analyzing shared references across papers
Loading...
Carlota Márquez-Pedregal
Patricia Pantaleón-Menéndez
Óscar Delgado Ben Mohatar
Royal Society Open Science
Universidad Autónoma de Madrid
Hospital Universitario Ramón y Cajal
Universidad Rey Juan Carlos
Building similarity graph...
Analyzing shared references across papers
Loading...
Márquez-Pedregal et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69d8968f6c1944d70ce080aa — DOI: https://doi.org/10.1098/rsos.251314
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: