What type of study is this?

This is a Quantitative Study study.

October 13, 2025Open Access

Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning

Key Points

The proposed framework improves data-efficiency by 40% on long-horizon tasks, indicating significant learning advancements.
A bi-phasic training scheme enhances exploration and learning from visual inputs, reinforcing the method's effectiveness.
DEMO3 validates its approach across 16 diverse tasks, showcasing versatility in tackling robotic manipulation challenges.
Results indicate a 70% improvement on particularly difficult tasks, highlighting the robustness of demonstration-augmented learning.

Abstract

Long-horizon tasks in robotic manipulation present significant challenges in reinforcement learning (RL) due to the difficulty of designing dense reward functions and effectively exploring the expansive state-action space. However, despite a lack of dense rewards, these tasks often have a multi-stage structure, which can be leveraged to decompose the overall objective into manageable subgoals. In this work, we propose DEMO3, a framework that exploits this structure for efficient learning from visual inputs. Specifically, our approach incorporates multi-stage dense reward learning, a bi-phasic training scheme, and world model learning into a carefully designed demonstration-augmented RL framework that strongly mitigates the challenge of exploration in long-horizon tasks. Our evaluations demonstrate that our method improves data-efficiency by an average of 40% and by 70% on particularly difficult tasks compared to state-of-the-art approaches. We validate this across 16 sparse-reward tasks spanning four domains, including challenging humanoid visual control tasks using as few as five demonstrations.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Adrià López Escoriza

Nicklas Hansen

Siqi Tao

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider