What does this research mean for the field?

A contact and shape-aware latent motion diffusion model (CoShMDM) incorporating a mesh penetration avoidance policy significantly improves the realism of text-generated two-person interactions by reducing mesh penetrations by up to 19% and increasing contact similarity by up to 33.2% compared to state-of-the-art approaches. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to generate realistic two-person interaction motions from text while overcoming limitations in existing models regarding contact and shape diversity.

March 22, 2026

CoShMDM: Contact and Shape-Aware Latent Motion Diffusion Model for Human Interaction Generation

Key Points

The aim is to generate realistic two-person interaction motions from text while overcoming limitations in existing models regarding contact and shape diversity.
Developed a contact and shape-aware latent motion representation and diffusion model (CoShMDM).
Utilized SMPL-based meshes and created a mesh contact matrix to ensure realistic interactions.
Implemented a reinforcement learning policy to avoid mesh penetrations.
Employed a dual-encoder VQ-VAE for learning disentangled representations for motion and contacts.
Integrated a novel contact and motion consistency module into the diffusion transformer.
Outperformed state-of-the-art approaches with the lowest FID scores of 4.801 and 0.013.
Achieved a 19% reduction in mesh penetrations and a 17.3% reduction in unrelated contacts.
Gained 17.8% and 33.2% improvements in contact similarity measures on evaluation datasets.

Abstract

Generating realistic two-person interaction motions from text holds immense potential in computer vision and animations. While existing latent motion diffusion models offer compact and efficient representations, they often fail to produce physically plausible contacts and are typically constrained to a single canonical body shape. As a result, the generated motion sequences exhibit substantial mesh penetrations and lack interaction realism. To address these limitations, we propose a contact and shape-aware latent motion representation and diffusion model (CoShMDM) for generating realistic two-person interactions from text. Our framework begins by constructing contact-compatible motion using SMPL-based meshes and a normal alignment-based mesh contact matrix to capture fine-grained mesh-level contacts. To account for shape diversity, we incorporate SMPL shape parameters and iteratively learn contact dynamics across different body shapes. Additionally, a reinforcement learning-based mesh penetration avoidance policy network, guided by signed distance fields, is introduced to minimize mesh penetrations while preserving contact fidelity and shape-aware motion. We further employ a dual-encoder VQ-VAE to learn disentangled latent representations for motion and contacts, which are then utilized in a text- and body-shape-conditioned diffusion model. To ensure spatial, temporal, and semantic coherence, we integrate a novel contact and motion consistency module into the diffusion transformer. Extensive evaluations on the InterHuman and InterX datasets demonstrate that our method outperforms state-of-the-art approaches achieving lowest FID scores (4.801 and 0.013), with 19% and 17.3% reductions in mesh penetrations, and 17.8% and 33.2% gains in contact similarity, respectively.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Manjotho et al. (Thu,) studied this question.

www.synapsesocial.com/papers/69bf8978f665edcd009e91ab — DOI: https://doi.org/10.1109/tvcg.2026.3675725

Authors

Ali Asghar Manjotho

Tekie Tsegay Tewolde

Ramadhani Ally Duma

Journals

IEEE Transactions on Visualization and Computer Graphics

Actions

Institutions

Beijing Institute of Technology

The University of Dodoma

Beijing Electronic Science and Technology Institute

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

CoShMDM: Contact and Shape-Aware Latent Motion Diffusion Model for Human Interaction Generation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion