What question did this study set out to answer?

This research aims to develop an improved traffic signal control method using a novel Deep Q-Network.

February 5, 2026Open Access

Coordinated Multi-Intersection Traffic Signal Control Using a Policy-Regulated Deep Q-Network

Key Points

This research aims to develop an improved traffic signal control method using a novel Deep Q-Network.
Introduced Policy-Regulated and Aligned Deep Q-Network (PRA-DQN) for signal control.
Developed a differentiable policy function for stable and interpretable behavior.
Created a cooperative reward structure optimizing local and regional efficiency.
Implemented a parameter-sharing multi-agent framework for scalability.
Conducted simulations on a 2 × 2 SUMO grid to evaluate performance.
PRA-DQN reduced maximum queue length by 21.17%.
Average queue length decreased by 18.75%.
Average waiting time dropped by 17.71% compared to fixed-time control.
PRA-DQN achieved an additional 7.53% reduction in average waiting time over classical DQN.

Abstract

Coordinated control across multiple signalized intersections is essential for mitigating congestion propagation in urban road networks. However, existing DQN-based approaches often suffer from unstable action switching, limited interpretability, and insufficient capability to model spatial spillback between adjacent intersections. To address these limitations, this study proposes a Policy-Regulated and Aligned Deep Q-Network (PRA-DQN) for cooperative multi-intersection signal control. A differentiable policy function is introduced and explicitly trained to align with the optimal Q-value-derived target distribution, yielding more stable and interpretable policy behavior. In addition, a cooperative reward structure integrating local delay, movement pressure, and upstream–downstream interactions enables agents to simultaneously optimize local efficiency and regional coordination. A parameter-sharing multi-agent framework further enhances scalability and learning consistency across intersections. Simulation experiments conducted on a 2 × 2 SUMO grid show that PRA-DQN consistently outperforms fixed-time, classical DQN, distributed DQN, and pressure/wave-based baselines. Compared with fixed-time control, PRA-DQN reduces maximum queue length by 21.17%, average queue length by 18.75%, and average waiting time by 17.71%. Moreover, relative to classical DQN coordination, PRA-DQN achieves an additional 7.53% reduction in average waiting time. These results confirm the effectiveness and superiority of the proposed method in suppressing congestion propagation and improving network-level traffic performance. The proposed PRA-DQN provides a practical and scalable basis for real-time deployment of coordinated signal control and can be readily extended to larger networks and time-varying demand conditions.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Ma et al. (Mon,) studied this question.

synapsesocial.com/papers/6984347ff1d9ada3c1fb29fc https://doi.org/https://doi.org/10.3390/su18031510

Bookmark

View Full Paper