Current safety evaluations of large language models (LLMs) predominantly rely on textual compliance, implicitly assuming that refusal-style responses correspond to safe behavior. This assumption becomes fragile when LLMs are embedded in agentic systems with the ability to execute state-changing actions. In this paper, we present an empirical critique of text-centric safety evaluation through an action-aware study of LLM agents under controlled conditions. Across multiple state-of-the-art models, we observe a recurring cognitive–action decoupling: agents generate policy-aligned refusal language while still producing unsafe tool-mediated action proposals. This produces an illusion of safety, where conversational audits indicate compliance even as operational risk persists. Our results show that text-based alignment metrics can underestimate behavioral risk in agentic settings, creating challenges for auditing and for interpreting compliance from conversational traces. We further show that preventing execution does not necessarily eliminate post-refusal action proposals, indicating that the absence of unsafe execution in such systems may depend on external constraints rather than intrinsic behavioral consistency. We therefore argue for the importance of action-aware evaluation, in which executed behavior is assessed alongside generated discourse. By framing alignment as a property spanning both language and action, this work provides empirical evidence and conceptual grounding for more robust oversight of agentic AI systems. This is the accepted manuscript of a paper accepted to the Proceedings of the 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT). The final published version will be available in the ACM Digital Library.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shasha Yu
Fiona Carroll
Barry L. Bentley
Cardiff Metropolitan University
Building similarity graph...
Analyzing shared references across papers
Loading...
Yu et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69e866c96e0dea528ddeb27c — DOI: https://doi.org/10.5281/zenodo.19663620
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: