Agentic artificial intelligence (AI)—multi-agent systems that combine large language models with external tools and autonomous planning—are rapidly transitioning from research labs into high-stakes domains. Existing evaluations emphasise narrow technical metrics such as task success or latency, leaving important sociotechnical dimensions like human trust, ethical compliance and economic sustainability under-measured. We propose a balanced evaluation framework spanning five axes (capability&efficiency, robustness& adaptability, safetyðics, human-centred interaction and economic&sustainability) and introduce novel indicators including goal-drift scores and harm-reduction indices. Beyond synthesising prior work, we identify gaps in current benchmarks, develop a conceptual diagram to visualise interdependencies and outline experimental protocols for empirically validating the framework. Case studies from recent industry deployments illustrate that agentic AI can yield 20–60 % productivity gains yet often omit assessments of fairness, trust and long-term sustainability. We argue that multidimensional evaluation—combining automated metrics with human-in-the-loop scoring and economic analysis—is essential for responsible adoption of agentic AI.
Building similarity graph...
Analyzing shared references across papers
Loading...
Manish Shukla (Tue,) studied this question.
www.synapsesocial.com/papers/68af6216ad7bf08b1eae36fd — DOI: https://doi.org/10.31224/5195
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Manish Shukla
Building similarity graph...
Analyzing shared references across papers
Loading...