Agentic artificial intelligence (AI)—multi-agent systems that combine large languagemodels with external tools and autonomous planning—are rapidly transitioning from researchlabs into high-stakes domains. Existing evaluations emphasise narrow technicalmetrics such as task success or latency, leaving important sociotechnical dimensions likehuman trust, ethical compliance and economic sustainability under-measured. We proposea balanced evaluation framework spanning five axes (capabilityefficiency, robustnessadaptability, safetyethics, human-centred interaction and economicsustainability)and introduce novel indicators including goal-drift scores and harm-reduction indices. Beyondsynthesising prior work, we identify gaps in current benchmarks, develop a conceptualdiagram to visualise interdependencies and outline experimental protocols for empiricallyvalidating the framework. Case studies from recent industry deployments illustrate thatagentic AI can yield 20–60 % productivity gains yet often omit assessments of fairness,trust and long-term sustainability. We argue that multidimensional evaluation—combiningautomated metrics with human-in-the-loop scoring and economic analysis—is essential forresponsible adoption of agentic AI.
Building similarity graph...
Analyzing shared references across papers
Loading...
M. K. Shukla (Tue,) studied this question.
www.synapsesocial.com/papers/68af63ddad7bf08b1eae409a — DOI: https://doi.org/10.20944/preprints202508.1847.v1
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
M. K. Shukla
Building similarity graph...
Analyzing shared references across papers
Loading...