Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. Although these models have garnered considerable enthusiasm and investment, meaningful evaluation of real-world performance remains a challenge, limiting the pace of development and inhibiting a nuanced understanding of current capabilities. Here, we rigorously evaluated multitask robot manipulation policies, referred to as large behavior models, by extending the diffusion policy paradigm across a corpus of simulated and real-world robot data. We proposed and validated an evaluation pipeline to rigorously analyze the capabilities of these models with statistical confidence. We compared against single-task baselines through blind, randomized trials in a controlled setting, using both simulation and real-world experiments. We found that multitask pretraining made the policies more successful and robust and enabled teaching complex new tasks more quickly, using a fraction of the data when compared with single-task baselines. Moreover, performance predictably increased as pretraining scale and diversity grows.
Building similarity graph...
Analyzing shared references across papers
Loading...
Barreiros et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69e1cecc5cdc762e9d857cc0 — DOI: https://doi.org/10.1126/scirobotics.aea6201
Jose Barreiros
Andrew Beaulieu
Aditya Bhat
Science Robotics
Massachusetts Institute of Technology
Cornell University
Toyota Research Institute
Building similarity graph...
Analyzing shared references across papers
Loading...