What question did this study set out to answer?

The aim is to critique text-centric safety evaluations of large language models and identify the cognitive-action decoupling effect.

April 22, 2026Open Access

When Saying "No" Is Not Enough: Cognitive-Action Decoupling and the Illusion of Safety in LLM Agents

Puntos clave

The aim is to critique text-centric safety evaluations of large language models and identify the cognitive-action decoupling effect.
Conducted an empirical evaluation of state-of-the-art LLMs under controlled conditions.
Analyzed the relationship between refusal language and unsafe tool-mediated actions.
Assessed compliance from conversational traces and operational risk metrics.
Found a recurring pattern of cognitive-action decoupling in LLM responses.
Demonstrated that text-based alignment metrics underestimate behavioral risks in agentic settings.
Indicated that preventing unsafe executions relies on external constraints rather than intrinsic model behavior.

Resumen

Current safety evaluations of large language models (LLMs) predominantly rely on textual compliance, implicitly assuming that refusal-style responses correspond to safe behavior. This assumption becomes fragile when LLMs are embedded in agentic systems with the ability to execute state-changing actions. In this paper, we present an empirical critique of text-centric safety evaluation through an action-aware study of LLM agents under controlled conditions. Across multiple state-of-the-art models, we observe a recurring cognitive–action decoupling: agents generate policy-aligned refusal language while still producing unsafe tool-mediated action proposals. This produces an illusion of safety, where conversational audits indicate compliance even as operational risk persists. Our results show that text-based alignment metrics can underestimate behavioral risk in agentic settings, creating challenges for auditing and for interpreting compliance from conversational traces. We further show that preventing execution does not necessarily eliminate post-refusal action proposals, indicating that the absence of unsafe execution in such systems may depend on external constraints rather than intrinsic behavioral consistency. We therefore argue for the importance of action-aware evaluation, in which executed behavior is assessed alongside generated discourse. By framing alignment as a property spanning both language and action, this work provides empirical evidence and conceptual grounding for more robust oversight of agentic AI systems. This is the accepted manuscript of a paper accepted to the Proceedings of the 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT). The final published version will be available in the ACM Digital Library.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Shasha Yu

Fiona Carroll

Barry L. Bentley

Actions

Institutions

Cardiff Metropolitan University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

When Saying "No" Is Not Enough: Cognitive-Action Decoupling and the Illusion of Safety in LLM Agents

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider