Generating realistic two-person interaction motions from text holds immense potential in computer vision and animations. While existing latent motion diffusion models offer compact and efficient representations, they often fail to produce physically plausible contacts and are typically constrained to a single canonical body shape. As a result, the generated motion sequences exhibit substantial mesh penetrations and lack interaction realism. To address these limitations, we propose a contact and shape-aware latent motion representation and diffusion model (CoShMDM) for generating realistic two-person interactions from text. Our framework begins by constructing contact-compatible motion using SMPL-based meshes and a normal alignment-based mesh contact matrix to capture fine-grained mesh-level contacts. To account for shape diversity, we incorporate SMPL shape parameters and iteratively learn contact dynamics across different body shapes. Additionally, a reinforcement learning-based mesh penetration avoidance policy network, guided by signed distance fields, is introduced to minimize mesh penetrations while preserving contact fidelity and shape-aware motion. We further employ a dual-encoder VQ-VAE to learn disentangled latent representations for motion and contacts, which are then utilized in a text- and body-shape-conditioned diffusion model. To ensure spatial, temporal, and semantic coherence, we integrate a novel contact and motion consistency module into the diffusion transformer. Extensive evaluations on the InterHuman and InterX datasets demonstrate that our method outperforms state-of-the-art approaches achieving lowest FID scores (4.801 and 0.013), with 19% and 17.3% reductions in mesh penetrations, and 17.8% and 33.2% gains in contact similarity, respectively.
Building similarity graph...
Analyzing shared references across papers
Loading...
Manjotho et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69bf8978f665edcd009e91ab — DOI: https://doi.org/10.1109/tvcg.2026.3675725
Ali Asghar Manjotho
Tekie Tsegay Tewolde
Ramadhani Ally Duma
IEEE Transactions on Visualization and Computer Graphics
Beijing Institute of Technology
The University of Dodoma
Beijing Electronic Science and Technology Institute
Building similarity graph...
Analyzing shared references across papers
Loading...