What question did this study set out to answer?

This research aims to develop a digital twin framework integrating deep reinforcement learning to improve energy efficiency in manufacturing systems.

June 1, 2026Open Access

A Conceptualized Digital Twin Framework for Energy-Efficient Control in Stochastic Manufacturing Using Deep Reinforcement Learning

Key Points

This research aims to develop a digital twin framework integrating deep reinforcement learning to improve energy efficiency in manufacturing systems.
Introduces a three-layer framework: physical, data, and digital layers.
Utilizes a model-free DRL agent based on Proximal Policy Optimization (PPO) to optimize machine operation.
Incorporates a novel reward system for energy-savings while maintaining productivity.
Achieves 19.57% energy reduction while maintaining 99.6% productivity.
Improvements result in a 12% increase in energy efficiency and a 2% gain in productivity compared to previous studies.

Abstract

The rising global demand for energy has intensified the push for sustainability in the manufacturing sector—one of the world’s largest energy consumers. While traditional manufacturing strategies often prioritize continuous machine operation to maximize output, such approaches frequently overlook opportunities for improving energy efficiency. In response, this research introduces a conceptualized Digital Twin (DT) framework that integrates Deep Reinforcement Learning (DRL) to reduce energy consumption in stochastic manufacturing systems without compromising productivity, while also facilitating the practical deployment of DRL solutions in real-world applications. The proposed framework is composed of three interconnected layers. The physical layer represents actual manufacturing entities, including machines, buffers, and control equipment. The data layer enables bidirectional communication between the physical and digital domains, ensuring real-time synchronization and data consistency. The digital layer incorporates a discrete-event simulation environment that mirrors the physical system and captures its complexity, including uncertainties such as variable part arrival times, processing durations, machine startup delays, breakdowns, and repair times. At the core of the digital layer is a model-free DRL agent based on Proximal Policy Optimization (PPO), which determines optimal machine on/off switching strategies. The agent is extensively fine-tuned through grid search to develop robust control policies capable of adapting to dynamic system conditions without requiring an explicit model of the environment. Additionally, a rule-based action validation step is incorporated to enhance safety and interpretability, enabling human-in-the-loop oversight where necessary. A key contribution of this work is the design of a novel reward system—featuring a tailored reward function and a structured delivery method—that enables the agent to learn effective energy-saving strategies while preserving productivity. Applied to a real-world machining workstation, the trained agent achieves 19.57% energy reduction while maintaining 99.6% productivity, corresponding to an approximately 12% improvement in energy efficiency and a 2% productivity gain compared to the most relevant prior study in the field.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Abadi et al. (Thu,) studied this question.

synapsesocial.com/papers/6a1d22db02fbce913063879f https://doi.org/https://doi.org/10.1016/j.procir.2026.05.060

Bookmark

View Full Paper