What question did this study set out to answer?

To develop a hybrid RL-DRO framework that effectively manages uncertainties in renewable energy systems.

May 8, 2026Open Access

Autonomous policy evolution and decision robustness in hybrid learning-optimization frameworks for energy systems with distributed renewables

Key Points

To develop a hybrid RL-DRO framework that effectively manages uncertainties in renewable energy systems.
Utilized a multi-agent reinforcement learning structure integrated with distributionally robust optimization (DRO).
Conducted a case study on a renewable-dominated microgrid with 4000 training iterations.
Evaluated performance based on expected cost reduction and robustness improvements.
Achieved a 9.7% reduction in expected cost compared to stochastic optimization.
Improved robustness by 28% as compared to conventional methods.
Demonstrated a decrease in emission trajectories from 200 to 140 tCO₂ during learning epochs.

Abstract

This study presents a hybrid reinforcement learning–assisted distributionally robust optimization (RL–DRO) framework for resilient and low-carbon energy system operation under uncertainty. The proposed model integrates a multi-agent reinforcement learning structure with a Wasserstein-metric distributionally robust formulation to capture both adaptive decision-making and conservative risk management. Reinforcement learning agents, representing distributed subsystems such as renewable generators, storage units, and flexible loads, are trained to minimize a composite objective combining expected cost and risk, while the DRO layer ensures robustness against distributional ambiguity. A case study on a renewable-dominated microgrid demonstrates that the RL–DRO framework converges smoothly within 4000 training iterations, achieving a 9. 7 % reduction in expected cost and a 28 % improvement in robustness compared with stochastic optimization. The optimal ambiguity radius balances efficiency and resilience, while renewable curtailment and storage utilization exhibit clear compensatory dynamics across uncertainty scenarios. Emission trajectories show an exponential decay from 200 to 140 tCO₂ across learning epochs, confirming the model’s ability to internalize environmental objectives. Overall, the RL–DRO architecture unifies data-driven learning and mathematical robustness, enabling distributed agents to achieve stable coordination and sustainable operation under high renewable penetration. The framework establishes a practical foundation for intelligent, risk-aware, and carbon-efficient decision-making in modern power systems.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yongle Zheng

Shiqian Wang

Zhongfu Tan

Journals

Scientific Reports

Actions

Institutions

North China Electric Power University

Economic Research Institute

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Autonomous policy evolution and decision robustness in hybrid learning-optimization frameworks for energy systems with distributed renewables

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study