What type of study is this?

This is a Cohort Study study (also classified as: Quantitative Study).

October 3, 2025Open Access

DINOv3-Diffusion Policy: Self-Supervised Large Visual Model for Visuomotor Diffusion Policy Learning

Key Points

DINOv3's self-supervised features improve sample efficiency and robustness in robotic tasks.
Finetuned DINOv3 achieves or exceeds ResNet-18 performance in multiple tasks, including Can and Push-T.
Frozen DINOv3 remains competitive, demonstrating its strong transferable priors to various tasks.
Compared to the ResNet-18 backbone, DINOv3 shows up to a 10% increase in success rates on complex manipulation tasks.

Abstract

This paper evaluates DINOv3, a recent large-scale self-supervised vision backbone, for visuomotor diffusion policy learning in robotic manipulation. We investigate whether a purely self-supervised encoder can match or surpass conventional supervised ImageNet-pretrained backbones (e.g., ResNet-18) under three regimes: training from scratch, frozen, and finetuned. Across four benchmark tasks (Push-T, Lift, Can, Square) using a unified FiLM-conditioned diffusion policy, we find that (i) finetuned DINOv3 matches or exceeds ResNet-18 on several tasks, (ii) frozen DINOv3 remains competitive, indicating strong transferable priors, and (iii) self-supervised features improve sample efficiency and robustness. These results support self-supervised large visual models as effective, generalizable perceptual front-ends for action diffusion policies, motivating further exploration of scalable label-free pretraining in robotic manipulation. Compared to using ResNet18 as a backbone, our approach with DINOv3 achieves up to a 10% absolute increase in test-time success rates on challenging tasks such as Can, and on-the-par performance in tasks like Lift, PushT, and Square.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

T. I. Egbe

Peng Wang

Zhihao Guo

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

DINOv3-Diffusion Policy: Self-Supervised Large Visual Model for Visuomotor Diffusion Policy Learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider