What question did this study set out to answer?

The aim is to develop a novel diffusion architecture that leverages structured state space duality for generating images and videos.

February 21, 2026Open Access

DiM-2: Exploiting Structured State Space Duality in Mamba-2 for Unified Image and Video Diffusion

Key Points

The aim is to develop a novel diffusion architecture that leverages structured state space duality for generating images and videos.
Introduced a Dual-Axis SSD Scanner (DASS) to manage spatial and temporal modeling with independent SSD kernels.
Implemented Semi-Separable Conditioning (SSC) for enhancing diffusion with structured matrices.
Proposed an experimental protocol using datasets such as ImageNet-256, UCF-101, and SkyTimelapse.
Demonstrated improved performance in unified image and video generation tasks.
Highlighted the effectiveness of decoupling spatial and temporal modeling in diffusion processes.

Abstract

We propose DiM-2, a diffusion architecture that directly exploits the Structured State Space Duality (SSD) of Mamba-2 for unified image and video generation. Our design introduces (1) a Dual-Axis SSD Scanner (DASS) that decouples spatial and temporal modeling using independent SSD kernels, and (2) Semi-Separable Conditioning (SSC) that injects diffusion timestep and conditioning signals via SSD structured matrices. This technical report presents the architecture design, theoretical motivation, and a proposed experimental protocol on ImageNet-256, UCF-101, and SkyTimelapse.

DiM-2: Exploiting Structured State Space Duality in Mamba-2 for Unified Image and Video Diffusion

Key Points

Abstract

Cite This Study