What question did this study set out to answer?

April 10, 2026

Motion2VecSets: Non-Rigid Shape Reconstruction and Tracking with 4D Latent Set Diffusion

Key Points

The central aim is to develop a diffusion model for dynamic surface mesh generation from ambiguous observations.
Introduced a 4D diffusion model for surface mesh generation.
Utilized iterative denoising of compressed latent representations.
Implemented latent sets to capture 4D dynamics and local deformations.
Developed a spatial-temporal attention block for efficient processing.
Achieved more accurate reconstruction of non-rigid deformations from partial observations.
Improved generalization to unseen motions and identities.
Outperformed prior methods in tracking performance on various datasets.

Abstract

We introduce Motion2VecSets, a 4D diffusion model for dynamic surface mesh generation from various ambiguous observations, including a sequence of RGB images, sparse and partial point clouds, and low-resolution voxel grids. While recent methods using neural field representations have shown success in modeling non-rigid objects, conventional feed-forward architectures struggle with noisy, partial, or sparse observations due to their deterministic nature. To address the inherent one-to-many mapping problem, we introduce a diffusion model that explicitly learns the shape and motion distribution of non-rigid objects through an iterative denoising process of compressed latent representations. The diffusion-based priors provide more plausible and diverse reconstructions under ambiguous conditions. Instead of relying on global latent codes, we represent 4D dynamics using latent sets. This novel 4D representation captures local shape and deformation patterns, leading to more accurate non-linear motion capture and significantly improving generalization capacity to unseen motions and identities. For temporally coherent tracking, we jointly denoise latent sets across frames and enable cross-frame information exchange. To reduce computational cost, we design an interleaved spatial-temporal attention block that alternately aggregates deformation latents along spatial and temporal dimensions. Extensive experiments on datasets of humans, animals, and articulated objects demonstrate that Motion2VecSets outperforms prior methods in reconstructing and tracking non-rigid deformations from various imperfect observations. Our implementation is available at https://vveicao.github.io/projects/Motion2VecSets/.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jiapeng Tang

Wei Cao

Biao Zhang

Journals

IEEE Transactions on Pattern Analysis and Machine Intelligence

Actions

Institutions

University of Illinois Urbana-Champaign

Technical University of Munich

King Abdullah University of Science and Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Motion2VecSets: Non-Rigid Shape Reconstruction and Tracking with 4D Latent Set Diffusion

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study