What question did this study set out to answer?

To enhance virtual coupling control of high-speed trains using a predictive multi-agent reinforcement learning framework.

May 7, 2026Open Access

Predictive Mamba-Enhanced Multi-Agent Reinforcement Learning Control for Virtual Coupling of High-Speed Trains

Key Points

To enhance virtual coupling control of high-speed trains using a predictive multi-agent reinforcement learning framework.
Proposed the Predictive Mamba-based Multi-Agent Soft Actor-Critic (PM-MASAC) framework.
Developed a Mamba-based state prediction module for better value estimation.
Introduced a multi-agent prioritized experience replay mechanism to stabilize training.
Designed a hierarchical local-global reward structure for improved coordination.
PM-MASAC showed enhanced robustness compared to baseline MARL methods.
Velocity tracking errors maintained within 3%.
Spacing tracking errors kept within 1%.
Steady-state formation success rate exceeded 95.7% in training.

Abstract

Virtual coupling control of trains is a promising technology for improving railway capacity and operational efficiency. However, existing multi-agent reinforcement learning (MARL) approaches struggle to capture long-sequence temporal dependencies among train states in complex multi-train interaction scenarios, resulting in limited robustness and coordination stability. To address this issue, this paper proposes a Predictive Mamba-based Multi-Agent Soft Actor–Critic (PM-MASAC) framework. A Mamba-based state prediction module is embedded into the centralized Critic network to model historical state sequences and generate predictive state representations, thereby enhancing value estimation accuracy. In addition, a multi-agent aggregated prioritized experience replay (PER) mechanism is introduced to improve the utilization of critical cooperative samples and stabilize training. A hierarchical local–global reward structure is further designed to ensure individual tracking performance while promoting overall formation coordination. Experimental results under realistic railway operating conditions demonstrate that PM-MASAC achieves superior robustness compared with baseline MARL methods. Velocity and spacing tracking errors are maintained within 3% and 1%, respectively, and the steady-state formation success rate exceeds 95.7% in the training environment.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Han Hu

Qingsheng Feng

Zhun Han

Journals

Electronics

Actions

Institutions

Tongji University

Dalian Jiaotong University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Predictive Mamba-Enhanced Multi-Agent Reinforcement Learning Control for Virtual Coupling of High-Speed Trains

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study