What question did this study set out to answer?

This survey examines the trustworthiness of agentic AI systems, focusing on safety, robustness, privacy, and system security.

May 5, 2026

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

Key Points

This survey examines the trustworthiness of agentic AI systems, focusing on safety, robustness, privacy, and system security.
Reviewed concepts related to safety, robustness, privacy, and system security in agentic AI.
Identified risk areas in the agent workflow and proposed targeted mitigation strategies.
Consolidated metrics and benchmarks to evaluate trustworthiness for agentic systems.
Outlined key challenges such as self-evolving agents and runtime monitoring.
Discussed a case study highlighting security failures in open-source agentic systems.
Provided guidance on metrics for evaluating safety and robustness in AI deployments.

Abstract

Agentic AI systems—Large Language Models (LLMs) augmented with planning, tool use, memory, and long-horizon interactions—can execute complex tasks autonomously, but their multi-step trajectories introduce new failure modes that challenge trustworthiness. This survey provides a focused examination of trustworthy agentic AI through two core dimensions that are critical for high-risk deployments: Safety and Robustness and Privacy and System Security. For each dimension, we clarify key concepts, identify where risks emerge along the agent workflow, and summarize stage-targeted mitigation strategies. Other trustworthiness aspects (value alignment, transparency, fairness, and accountability) are discussed as relevant context rather than parallel chapters. To support consistent comparison and deployment decisions, we consolidate evaluation into a unified metrics-and-benchmarks hub, emphasizing both outcome and process signals (e.g., constraint violations, trace completeness, and adversarial success rates) and offering scenario-to-metric guidance for release gating. We conclude by outlining open challenges such as self-evolving agents, runtime monitoring and verification, privacy-preserving personalization, and the trust–utility trade-off, and present a case study of real-world security failures in open-source agentic systems (OpenClaw/Moltbook). Our goal is to serve as a practical reference for researchers and practitioners building trustworthy agentic systems in high-stakes environments.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jinhu Qi

Muzhi Li

Jiahong Liu

Actions

Institutions

Chinese University of Hong Kong

Fudan University

Shanghai Academy of Environmental Sciences

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider