What does this research mean for the field?

The GRPO-MDP framework enhances UAV path planning by improving exploration flexibility and safety in dynamic environments, outperforming existing methods in success rate, path efficiency, and safety. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to address uncertainties in UAV path planning by improving trajectory exploration and safety.

February 26, 2026Open Access

Enhancing UAV path planning with diffusion models and group relative policy optimization

Key Points

The aim is to address uncertainties in UAV path planning by improving trajectory exploration and safety.
Developed the GRPO-MDP framework combining DRL with multimodal diffusion strategies.
Utilized a Denoising Diffusion Probabilistic Model to create diverse action samples.
Implemented Group Relative Policy Optimization to construct advantage functions through trajectory comparisons.
Introduced a hindsight trajectory relabeling mechanism to enhance learning from failures.
Employed Control Barrier Function to enforce safety constraints during planning.
The GRPO-MDP framework outperformed existing methods in success rate.
Demonstrated improved path efficiency compared to previous techniques.
Ensured higher safety levels in dynamic environments.

Abstract

This paper addresses the uncertainties arising from dynamic environments in autonomous path planning tasks for unmanned aerial vehicles (UAVs). Existing Deep Reinforcement Learning (DRL) methods struggle to represent multiple feasible trajectories and often neglect failure experiences. To overcome these limitations, we propose a novel framework termed GRPO-MDP (Group Relative Policy Optimization with a Multimodal Diffusion Strategy). The framework leverages a Denoising Diffusion Probabilistic Model (DDPM) to generate diverse action samples during training. By integrating DRL with multimodal diffusion strategies, GRPO-MDP enables UAVs to explore multiple feasible trajectories from the same state, enhancing exploration flexibility. The proposed Group Relative Policy Optimization (GRPO) method constructs advantage functions through relative comparisons within groups of trajectories, thereby eliminating the need for a value network and improving training stability. In addition, a hindsight trajectory relabeling mechanism is introduced to convert failure experiences into informative learning signals by incorporating virtual target and safety boundary modes. To guarantee real-time safety, a Control Barrier Function (CBF) is employed as a safety filter, ensuring that hard safety constraints are strictly enforced during path planning. Experimental results demonstrate that the proposed GRPO-MDP framework outperforms existing methods in terms of success rate, path efficiency, and safety in dynamic environments.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Liu et al. (Wed,) studied this question.

synapsesocial.com/papers/699fe41d95ddcd3a253e85a5 https://doi.org/https://doi.org/10.1016/j.aej.2026.02.024

Bookmark

View Full Paper