The cooperative multiagent reinforcement learning (MARL) has been widely used in many practical applications. Despite its success, a fundamental issue arises in MARL that agents face the dilemma of whether to select the best action to maximize rewards or to acquire more information collectively by exploring the novel states/actions due to partial observability. To solve this issue, existing methods merge exploration and exploitation methods. However, these methods are always suboptimal and may lead to failure in finishing tasks. In this article, we theoretically prove the existence of a latent state that can guarantee the optimal individual and global policies. Moreover, we prove that such a latent state can be approximately obtained by local observations. Based on the analysis, we propose a method named unified MARL (UMARL), which is a weighted value function factorization approach unifying exploitation and exploration in one framework. Specifically, we design the agent representation network (ARN) and individual weighting networks (IWNs) to learn agents' unified representations and weights of credit. Moreover, a latent state regularizer (LSR) is designed to encourage agents' representations to approximate the latent state. Extensive experiments show that UMARL can achieve superior performance compared with 12 state-of-the-art methods on m -step matrix game, level-based foraging (LBF), StarCraft II, and Google research football (GRF). The source code is available at: https: //github. com/CrazyBayes/UMARL.
Building similarity graph...
Analyzing shared references across papers
Loading...
Kong et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69c0ddb8fddb9876e79c12c9 — DOI: https://doi.org/10.1109/tnnls.2026.3673692
He Kong
Qianli Xing
Qi Wang
IEEE Transactions on Neural Networks and Learning Systems
Chinese University of Hong Kong
Jilin University
Inner Mongolia University
Building similarity graph...
Analyzing shared references across papers
Loading...