Beyond-Visual-Range (BVR) air combat is the dominant paradigm of future aerial warfare, characterized by intense adversarial dynamics and extended engagement horizons. However, existing research is predominantly confined to simplified simulation and isolated maneuver decision-making, failing to address the complex optimization challenges of the deep coupling between tactical maneuvering and attack decision. To address these challenges, this paper proposes a Hierarchical Hybrid Action Proximal Policy Optimization (HHAPPO) algorithm. A high-fidelity environment is established, incorporating six-degree-of-freedom (6-DOF) aircraft dynamics, radar detection probability model, and missile models, to serve as a robust testbed. First, a hybrid action space integrating continuous and discrete variables is designed, enabling the synchronous optimization to achieve superior tactical synergy. Second, to mitigate the convergence challenges inherent in 6-DOF flight dynamics control, a hierarchical control architecture utilizing a behavior cloning warm-start mechanism is proposed. This framework leverages data from a PID controller to pre-train the low level agent before reinforcement learning fine-tuning. Finally, a kill-chain-based reward shaping mechanism and objective function are developed to accelerate convergence and enhance performance. Simulation results demonstrate that the proposed algorithm effectively learns coupled maneuver-attack strategies against a Finite State Machine (FSM) expert system, exhibiting significantly superior win rates and convergence characteristics compared to traditional PPO baselines.
Building similarity graph...
Analyzing shared references across papers
Loading...
Qiang Guo
H Liu
Yongliang Tian
Defence Technology
City University of Hong Kong
Beihang University
Building similarity graph...
Analyzing shared references across papers
Loading...
Guo et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69fd7f0dbfa21ec5bbf075f8 — DOI: https://doi.org/10.1016/j.dt.2026.04.019