We introduce Motion2VecSets, a 4D diffusion model for dynamic surface mesh generation from various ambiguous observations, including a sequence of RGB images, sparse and partial point clouds, and low-resolution voxel grids. While recent methods using neural field representations have shown success in modeling non-rigid objects, conventional feed-forward architectures struggle with noisy, partial, or sparse observations due to their deterministic nature. To address the inherent one-to-many mapping problem, we introduce a diffusion model that explicitly learns the shape and motion distribution of non-rigid objects through an iterative denoising process of compressed latent representations. The diffusion-based priors provide more plausible and diverse reconstructions under ambiguous conditions. Instead of relying on global latent codes, we represent 4D dynamics using latent sets. This novel 4D representation captures local shape and deformation patterns, leading to more accurate non-linear motion capture and significantly improving generalization capacity to unseen motions and identities. For temporally coherent tracking, we jointly denoise latent sets across frames and enable cross-frame information exchange. To reduce computational cost, we design an interleaved spatial-temporal attention block that alternately aggregates deformation latents along spatial and temporal dimensions. Extensive experiments on datasets of humans, animals, and articulated objects demonstrate that Motion2VecSets outperforms prior methods in reconstructing and tracking non-rigid deformations from various imperfect observations. Our implementation is available at https://vveicao.github.io/projects/Motion2VecSets/.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jiapeng Tang
Wei Cao
Biao Zhang
IEEE Transactions on Pattern Analysis and Machine Intelligence
University of Illinois Urbana-Champaign
Technical University of Munich
King Abdullah University of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Tang et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69d892d16c1944d70ce04069 — DOI: https://doi.org/10.1109/tpami.2026.3680779