Demand response is shifting towards continuous coordination of flexible demand, storage, and distributed generation across buildings and prosumer communities. Multi-agent reinforcement learning has gained attention because it can support decentralized execution under partial observability while still learning coordinated behavior through centralized training. This systematic review follows PRISMA 2020 guidance and synthesizes n=70 peer-reviewed studies published in the 2021 to 2025 window, covering building clusters, grid-aware district coordination, program-level aggregation, industrial demand response, and transactive energy mechanisms. The results show that the dominant evaluation context is grid-responsive building clusters, with growing reliance on benchmark environments that standardize interfaces and encourage reproducible multi-KPI reporting. Across the methods, centralized training with decentralized execution is the prevailing pattern, often combined with attention-based critics or value factorization to handle heterogeneity and global rewards. Reward design and constraint handling emerge as primary determinants of stability, since objectives mix cost, peak, ramp, comfort, and emissions, while rebound and synchronized behavior are recurring risks. A descriptive and cross-variable quantitative synthesis is also provided, showing that publication activity increased from three studies (4.3%) in 2021 to 28 studies (40.0%) in 2025, with the strongest concentration in 2024–2025. Quantitatively, grid-responsive building clusters accounted for 26 of 70 studies (37.1%), actor–critic methods for 24 studies (34.3%), CityLearn for 16 studies (22.9%), and cost-based evaluation was reported in 64 studies (91.4%), whereas robustness testing appeared in only 16 studies (22.9%). Across the reviewed studies, peak reduction was reported in 55 (78.6%) studies, whereas robustness testing appeared in only 16 studies (22.9%) and transferability or deployment realism in 11 (15.7%), indicating that evaluation remains much stronger for operational performance than for real-world generalization.
Building similarity graph...
Analyzing shared references across papers
Loading...
Suhaib Sajid
Bin Li
Bing Qi
Energies
North China Electric Power University
State Nuclear Power Technology Company (China)
Building similarity graph...
Analyzing shared references across papers
Loading...
Sajid et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69fa986a04f884e66b53232e — DOI: https://doi.org/10.3390/en19092170