What question did this study set out to answer?

This study aims to create a synthetic benchmark, U-Bench, for evaluating agentic governance across multiple domains.

May 21, 2026Open Access

Unified Benchmark for Agentic Governance Across Power, Finance, and Software

Key Points

This study aims to create a synthetic benchmark, U-Bench, for evaluating agentic governance across multiple domains.
Developed U-Bench by integrating tasks from power, finance, and software governance.
Evaluated U-Bench through simulated assessments to differentiate systems based on governance capabilities.
Combined various operational tasks including telemetry, retrieval, incident response, and policy explanation into one schema.
U-Bench effectively distinguishes between systems with high accuracy and those genuinely governable.
The evaluation reveals significant trade-offs in areas such as recovery speed, evidence traceability, contract compliance, and latency.

Abstract

Agentic governance systems are being proposed for power-grid monitoring, financial retrieval, cloud incident response, documentation maintenance, policy explanation, and cross-cloud model operations. These systems use different data types and operational controls, but they share recurring governance questions: whether actions are grounded in evidence, whether tools are invoked under contracts, whether risks are bounded, whether latency is acceptable, and whether explanations remain traceable after domain transfer. This paper proposes U-Bench, a synthetic unified benchmark for evaluating agentic governance across power, finance, and software operations. U-Bench combines grid telemetry tasks, financial RAG tasks, incident-response tasks, legal-policy explanation tasks, calibration and forecasting tasks, and cross-cloud workload-routing tasks into a shared evaluation harness. It extends Low-Latency Grid Intelligence with Self-Governing Stream and Calibration Agents, Risk-Aware Financial RAG with Distributional Retrieval Policies, Contract-Driven Multi-Agent Incident Response for Cloud-Native Platforms, and Meme-Aware Legal and Policy RAG for Explainable Governance into one benchmark schema. In a simulated evaluation, U-Bench distinguishes systems that appear strong on accuracy from systems that are actually governable, revealing tradeoffs among recovery speed, tail-risk control, evidence traceability, contract compliance, and latency.

Unified Benchmark for Agentic Governance Across Power, Finance, and Software

Key Points

Abstract

Cite This Study