What question did this study set out to answer?

The aim is to enhance energy management in power routers by integrating advanced optimization techniques.

April 10, 2026Open Access

Energy collaborative optimization of power routing based on PPO and generative adversarial imitation learning

Key Points

The aim is to enhance energy management in power routers by integrating advanced optimization techniques.
Integrated Proximal Policy Optimization with a multi-agent framework
Implemented Generative Adversarial Imitation Learning with a double-buffer mechanism
Conducted experiments involving 420 training sessions to evaluate performance
Achieved an average round reward stabilized at around −410 after 420 trainings
Maintained a DC bus voltage fluctuation between 728V and 732V
Reduced electricity cost to 3846.36 yuan and total runtime to 53.32 seconds compared to other models

Abstract

Under the general trend of global energy transformation, the proportion of renewable energy in the power sector continues to increase. Power routers are of great significance for improving energy utilization efficiency and ensuring the stable operation of power systems. However, the intermittent and uncertain nature of distributed energy makes energy management of power routers difficult, and traditional optimization methods are also difficult to adapt. Therefore, this study proposes the integration of Proximal Policy Optimization with a multi-agent framework, combined with a Generative Adversarial Imitation Learning based on a double-buffer mechanism. The double-buffer mechanism is used to improve data utilization efficiency and training stability, and to optimize communication and collaboration among multiple agents, thereby realizing energy collaborative optimization of power routers. Experimental results show that after 420 trainings, the average round reward of the improved algorithm is stable at about −410, and the strategy loss function is the first to stabilize after 500 times. In practical scenarios, the proposed model maintains a DC bus voltage fluctuation range between 728V and 732V. Additionally, its electricity cost amounts to 3846.36 yuan, and its total runtime is 53.32 seconds—both of which are lower than those of the other two models. Overall, the enhanced algorithm and model notably improve the energy collaboration optimization of power routers, offering a practical solution to energy management issues and significantly advancing the progress in this area.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Junyan Lyu

Jing Huang

Journals

PLoS ONE

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Energy collaborative optimization of power routing based on PPO and generative adversarial imitation learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study