What type of study is this?

This is a Literature Review study (also classified as: Observational).

August 26, 2025Open Access

Evaluating Agentic AI Systems:A Balanced Framework for Performance, Robustness, Safety and Beyond

Key Points

The multidimensional evaluation highlights the importance of balancing performance with ethical compliance and human trust.
Key metrics show that agentic AI systems can improve productivity by 20–60%, yet often lack fair assessments.
Observational analysis encourages integrating automated metrics with human scoring and economic evaluations.
Implementing this framework could lead to more responsible adoption of agentic AI by addressing critical gaps.

Abstract

Agentic artificial intelligence (AI)—multi-agent systems that combine large languagemodels with external tools and autonomous planning—are rapidly transitioning from researchlabs into high-stakes domains. Existing evaluations emphasise narrow technicalmetrics such as task success or latency, leaving important sociotechnical dimensions likehuman trust, ethical compliance and economic sustainability under-measured. We proposea balanced evaluation framework spanning five axes (capabilityefficiency, robustnessadaptability, safetyethics, human-centred interaction and economicsustainability)and introduce novel indicators including goal-drift scores and harm-reduction indices. Beyondsynthesising prior work, we identify gaps in current benchmarks, develop a conceptualdiagram to visualise interdependencies and outline experimental protocols for empiricallyvalidating the framework. Case studies from recent industry deployments illustrate thatagentic AI can yield 20–60 % productivity gains yet often omit assessments of fairness,trust and long-term sustainability. We argue that multidimensional evaluation—combiningautomated metrics with human-in-the-loop scoring and economic analysis—is essential forresponsible adoption of agentic AI.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

M. K. Shukla (Tue,) studied this question.

www.synapsesocial.com/papers/68af63ddad7bf08b1eae409a — DOI: https://doi.org/10.20944/preprints202508.1847.v1

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Evaluating Agentic AI Systems:A Balanced Framework for Performance, Robustness, Safety and Beyond

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion