What type of study is this?

This is a Quantitative Study study.

October 13, 2025Open Access

Tenma: Robust Cross-Embodiment Robot Manipulation with Diffusion Transformer

Key Points

Tenma achieves a remarkable 88.95% success rate under strict conditions, far surpassing the 18.12% rate of baseline policies.
The methodology involved training with heterogeneous, multimodal robot data, enabling robust manipulation in diverse settings.
A cross-embodiment normalizer was critical in mapping state/action spaces into a shared latent space, enhancing performance.
Tenma demonstrates significant potential for advancing transformer-based imitation learning within robotic systems.

Abstract

Scaling Transformer policies and diffusion models has advanced robotic manipulation, yet combining these techniques in lightweight, cross-embodiment learning settings remains challenging. We study design choices that most affect stability and performance for diffusion-transformer policies trained on heterogeneous, multimodal robot data, and introduce Tenma, a lightweight diffusion-transformer for bi-manual arm control. Tenma integrates multiview RGB, proprioception, and language via a cross-embodiment normalizer that maps disparate state/action spaces into a shared latent space; a Joint State-Time encoder for temporally aligned observation learning with inference speed boosts; and a diffusion action decoder optimized for training stability and learning capacity. Across benchmarks and under matched compute, Tenma achieves an average success rate of 88.95% in-distribution and maintains strong performance under object and scene shifts, substantially exceeding baseline policies whose best in-distribution average is 18.12%. Despite using moderate data scale, Tenma delivers robust manipulation and generalization, indicating the great potential for multimodal and cross-embodiment learning strategies for further augmenting the capacity of transformer-based imitation learning policies.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

T. Claire Davies

Yiqi Huang

Yunxin Liu

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Tenma: Robust Cross-Embodiment Robot Manipulation with Diffusion Transformer

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study