Key points are not available for this paper at this time.
Traffic signal control over urban networks requires coordinating the controllers of multiple signalized intersections toward a shared goal of minimizing network-wide congestion. Multi-agent reinforcement learning (MARL) methods have shown considerable promise in this setting. The epsilon–greedy exploration strategy adopted by many of these methods treats every candidate signal phase as equally worth trying, discarding the rich domain knowledge that traffic theory already provides. This paper proposes fuzzy-guided exploration, in which a multi-criteria fuzzy inference system uses local traffic conditions, with phase pressure as its primary input, to assign each candidate phase a priority. These priorities define a sampling distribution used in place of the uniform draw. We evaluate the method across four MARL algorithms covering independent learning (IQL) and the centralized training with decentralized execution paradigm (VDN, QMIX, and QPLEX) on both a synthetic grid and a real-world network. Fuzzy-guided exploration consistently improves upon the baseline in all combinations, with tangible gains on the synthetic grid and substantially larger improvements on the real-world network. These findings demonstrate that exploration is an effective intervention point for domain-knowledge integration in cooperative MARL, and that pressure-based scoring provides a well-suited signal to serve that role in traffic signal control.
Ćiprovski et al. (Tue,) studied this question.