What type of study is this?

This is a Literature Review study (also classified as: Observational).

August 26, 2025Open Access

Evaluating Agentic AI Systems: A Balanced Framework for Performance, Robustness, Safety and Beyond

Key Points

Agentic AI systems can yield productivity gains of 20–60%, yet often lack assessments of fairness and trust.
The proposed evaluation framework covers capabilities, robustness, safety, and economic sustainability among others.
Multidimensional evaluation combines automated metrics with human evaluations for a more comprehensive assessment.
This approach supports the responsible adoption of agentic AI in high-stakes domains, addressing overlooked sociotechnical dimensions.

Abstract

Agentic artificial intelligence (AI)—multi-agent systems that combine large language models with external tools and autonomous planning—are rapidly transitioning from research labs into high-stakes domains. Existing evaluations emphasise narrow technical metrics such as task success or latency, leaving important sociotechnical dimensions like human trust, ethical compliance and economic sustainability under-measured. We propose a balanced evaluation framework spanning five axes (capability&efficiency, robustness& adaptability, safetyðics, human-centred interaction and economic&sustainability) and introduce novel indicators including goal-drift scores and harm-reduction indices. Beyond synthesising prior work, we identify gaps in current benchmarks, develop a conceptual diagram to visualise interdependencies and outline experimental protocols for empirically validating the framework. Case studies from recent industry deployments illustrate that agentic AI can yield 20–60 % productivity gains yet often omit assessments of fairness, trust and long-term sustainability. We argue that multidimensional evaluation—combining automated metrics with human-in-the-loop scoring and economic analysis—is essential for responsible adoption of agentic AI.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Manish Shukla (Tue,) studied this question.

www.synapsesocial.com/papers/68af6216ad7bf08b1eae36fd — DOI: https://doi.org/10.31224/5195

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Evaluating Agentic AI Systems: A Balanced Framework for Performance, Robustness, Safety and Beyond

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion