What question did this study set out to answer?

This research aims to address the challenge of sparse rewards in reinforcement learning for manipulator grasping tasks.

March 7, 2026Open Access

PHER: A Method for Solving the Sparse Reward Problem of a Manipulator Grasping Task

Key Points

This research aims to address the challenge of sparse rewards in reinforcement learning for manipulator grasping tasks.
Utilized off-policy reinforcement learning for the grasping task model.
Employed hindsight experience replay (HER) to relabel completed states.
Introduced a prioritized sampling method for selecting relabeled transitions from the experience replay buffer.
Combined prioritized sampling with various off-policy reinforcement learning algorithms for training.
PHER demonstrated significantly faster convergence than standard HER methods.
Improved data utilization was observed due to the prioritized sampling approach.

Abstract

Off-policy reinforcement learning is usually used to train the grasping task model of the manipulator. However, in the training process, it is difficult to collect enough successful experience data and rewards for learning and training; that is, there is a problem of sparse rewards. Hindsight experience replay (HER) allows the agent to relabel the completed states. However, not all failed experiences have the same effect on learning and training. Facing the many transitions generated by the environment during operation, adopting a random uniform sampling method from the experience replay buffer will result in low data utilization and slow convergence. This paper proposes using a prioritized sampling method to sample the relabelled transitions, and then combines various off-policy reinforcement learning algorithms with it for training in simulated environments. This paper uses the prioritized sampling method, which allows the agent to access more important transitions earlier and accelerate the convergence of training. The results demonstrate that hindsight experience replay with prioritization (PHER) exhibits significantly faster convergence compared to other methods.

PHER: A Method for Solving the Sparse Reward Problem of a Manipulator Grasping Task

Key Points

Abstract

Cite This Study