What question did this study set out to answer?

This work aims to improve scheduling in flexible job shop environments using a novel reinforcement learning model.

May 2, 2026Open Access

Flexible Job Shop Scheduling Problem Based on Deep Reinforcement Learning Using Dual Attention Network

Key Points

This work aims to improve scheduling in flexible job shop environments using a novel reinforcement learning model.
Developed a deep reinforcement learning model with Multi-Proximal Policy Optimization (MPPO) and Dual Attention Network (DAN) to solve FJSP.
Utilized operation and machine message attention blocks to capture relationships among operations and machines.
Conducted experiments on SD1 and SD2 datasets comparing the proposed model with traditional scheduling methods.
The algorithm reduces makespan by up to 4.2% on SD1 and 10.1% on SD2.
It outperforms traditional scheduling rules in efficiency and effectiveness.
The proposed method showcases superior comprehensive performance compared to other comparison methods.

Abstract

Industry 4.0 is transforming the way companies manufacture, improve, and distribute products, moving toward fast, intelligent, and flexible manufacturing, which will bring about fundamental changes in enterprises’ production capabilities. The Flexible Job Shop Scheduling Problem (FJSP) allows a single job to be divided into multiple operations, each of which can be processed on multiple machines. Due to its high flexibility and complexity, traditional scheduling methods are difficult to meet the needs of dynamic production. Dispatching rules struggle to effectively perceive the global precedence relationships among jobs and the distribution of machine workloads; metaheuristic approaches suffer from slow iterative convergence; existing deep reinforcement learning methods often employ a single policy network to handle both operation sequencing and machine assignment in a coupled manner, which tends to cause training instability and slow convergence. This paper proposes a deep reinforcement learning model that integrates Multi-Proximal Policy Optimization (MPPO) and Dual Attention Network (DAN) to address the FJSP. The model uses the operation message attention block and machine message attention block of DAN to capture the dependency relationships between operations and the dynamic competitive relationships between machines, respectively, and extract deep features. At the same time, MPPO designs dual actor networks to handle operation sequencing and machine assignment decisions separately, and combines a centralized critic to optimize the policy. This balances exploration and exploitation and improves training stability. Experiments are conducted based on the SD1 and SD2 datasets. In FJSP instances of four scales, the model is compared with PPO-DAN, PPO-HGNN, traditional scheduling rules, and OR-Tools. The results show that the algorithm reduces makespan by up to 4.2% on SD1 and 10.1% on SD2. Moreover, it achieves better performance than traditional scheduling rules. Its comprehensive performance is superior to that of the comparison methods, verifying its effectiveness and practical application potential in solving the FJSP.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Xu et al. (Tue,) studied this question.

synapsesocial.com/papers/69f594fc71405d493afffecd https://doi.org/https://doi.org/10.3390/pr14091419

Bookmark

View Full Paper