Coordinated control across multiple signalized intersections is essential for mitigating congestion propagation in urban road networks. However, existing DQN-based approaches often suffer from unstable action switching, limited interpretability, and insufficient capability to model spatial spillback between adjacent intersections. To address these limitations, this study proposes a Policy-Regulated and Aligned Deep Q-Network (PRA-DQN) for cooperative multi-intersection signal control. A differentiable policy function is introduced and explicitly trained to align with the optimal Q-value-derived target distribution, yielding more stable and interpretable policy behavior. In addition, a cooperative reward structure integrating local delay, movement pressure, and upstream–downstream interactions enables agents to simultaneously optimize local efficiency and regional coordination. A parameter-sharing multi-agent framework further enhances scalability and learning consistency across intersections. Simulation experiments conducted on a 2 × 2 SUMO grid show that PRA-DQN consistently outperforms fixed-time, classical DQN, distributed DQN, and pressure/wave-based baselines. Compared with fixed-time control, PRA-DQN reduces maximum queue length by 21.17%, average queue length by 18.75%, and average waiting time by 17.71%. Moreover, relative to classical DQN coordination, PRA-DQN achieves an additional 7.53% reduction in average waiting time. These results confirm the effectiveness and superiority of the proposed method in suppressing congestion propagation and improving network-level traffic performance. The proposed PRA-DQN provides a practical and scalable basis for real-time deployment of coordinated signal control and can be readily extended to larger networks and time-varying demand conditions.
Ma et al. (Mon,) studied this question.