What does this research mean for the field?

The Hierarchical Hybrid Action Proximal Policy Optimization (HHAPPO) algorithm achieves significantly superior win rates and convergence characteristics in beyond-visual-range air combat simulations compared to traditional PPO baselines by effectively learning coupled maneuver-attack strategies. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This research aims to develop a framework that enhances decision-making in beyond-visual-range air combat through advanced reinforcement learning algorithms.

May 8, 2026Open Access

A hierarchical hybrid action reinforcement learning framework for beyond-visual-range air combat decision-making

Key Points

This research aims to develop a framework that enhances decision-making in beyond-visual-range air combat through advanced reinforcement learning algorithms.
Proposed a Hierarchical Hybrid Action Proximal Policy Optimization (HHAPPO) algorithm.
Established a high-fidelity environment incorporating 6-DOF dynamics, radar detection, and missile models.
Designed a hybrid action space with continuous and discrete variables and utilized behavior cloning for pre-training.
The HHAPPO algorithm achieved significantly higher win rates against a Finite State Machine expert system.
Demonstrated improved convergence characteristics compared to traditional Proximal Policy Optimization (PPO) baselines.

Abstract

Beyond-Visual-Range (BVR) air combat is the dominant paradigm of future aerial warfare, characterized by intense adversarial dynamics and extended engagement horizons. However, existing research is predominantly confined to simplified simulation and isolated maneuver decision-making, failing to address the complex optimization challenges of the deep coupling between tactical maneuvering and attack decision. To address these challenges, this paper proposes a Hierarchical Hybrid Action Proximal Policy Optimization (HHAPPO) algorithm. A high-fidelity environment is established, incorporating six-degree-of-freedom (6-DOF) aircraft dynamics, radar detection probability model, and missile models, to serve as a robust testbed. First, a hybrid action space integrating continuous and discrete variables is designed, enabling the synchronous optimization to achieve superior tactical synergy. Second, to mitigate the convergence challenges inherent in 6-DOF flight dynamics control, a hierarchical control architecture utilizing a behavior cloning warm-start mechanism is proposed. This framework leverages data from a PID controller to pre-train the low level agent before reinforcement learning fine-tuning. Finally, a kill-chain-based reward shaping mechanism and objective function are developed to accelerate convergence and enhance performance. Simulation results demonstrate that the proposed algorithm effectively learns coupled maneuver-attack strategies against a Finite State Machine (FSM) expert system, exhibiting significantly superior win rates and convergence characteristics compared to traditional PPO baselines.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Qiang Guo

H Liu

Yongliang Tian

Journals

Defence Technology

Actions

Institutions

City University of Hong Kong

Beihang University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A hierarchical hybrid action reinforcement learning framework for beyond-visual-range air combat decision-making

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study