Prosumer communities, aggregations of residential and commercial entities equipped with distributed energy resources (DER), including photovoltaic systems, battery storage, and flexible loads, are emerging as critical organizational units in decarbonising smart grid architectures. Managing these communities effectively requires balancing economic efficiency with equity, autonomy, and environmental sustainability, objectives that conventional centralized control methods and existing multi-agent reinforcement learning (MARL) implementations fail to address simultaneously. This article proposes a value-aligned hierarchical multi-agent reinforcement learning (VA-HMARL) framework as a formally unified architecture that embeds equity (Jain’s Fairness Index J ≥ 0.90), individual autonomy, and carbon sustainability as hard constraints within the MARL reward structure. The framework integrates: a multi-objective Value Alignment Module (VAM) combining economic, fairness, sustainability, and comfort objectives; attention-based implicit coordination for scalable agent interaction; and differentially private federated policy aggregation (ε = 1.0, δ = 10−5) for GDPR-compliant collaborative learning. Simulation on a 20-prosumer community modelled on the IEEE 33-bus feeder over 10 Monte Carlo runs (300 episodes each) demonstrates: a 6.2% energy cost reduction versus the Rule-Based baseline (p = 0.0004); a Jain’s Fairness Index of 0.912 ± 0.031 at policy convergence (final 50 episodes), satisfying the J ≥ 0.90 community equity floor; and an 18.0% reduction in CO2 emissions. The economic efficiency trade-off relative to performance-optimized MARL baselines is limited to 2.4%, within the 5% design target. These results establish VA-HMARL as a technically feasible and ethically grounded paradigm for autonomous decentralized energy governance.
Dragomir et al. (Sat,) studied this question.