SparseFlow technical report / preprint. Video diffusion transformers (DiTs) achieve strong generative performance, but self-attention cost scales poorly with video length and resolution. Prior efficient-attention methods optimize either the attention algorithm or the execution kernel, but not both. Sparse and linear variants reduce nominal complexity yet yield limited end-to-end speedup under generic kernels, while FlashAttention-style methods accelerate dense attention without exploiting video-specific sparse structure. SparseFlow-Attention is proposed as a hybrid attention design for video DiTs that co-designs approximation structure and GPU execution. SparseFlow decomposes attention into two complementary components: block-sparse temporal attention across frames, motivated by temporal locality and anchor-frame structure, and linear spatial attention within frames, motivated by spatial redundancy and approximate low-rank structure. A denoising-adaptive sparsity schedule varies temporal sparsity with diffusion timestep, applying more aggressive sparsity in early steps and denser connectivity in later steps to preserve fine details. A fused CUDA kernel executes sparse temporal attention and linear spatial aggregation in a single pipelined launch, reducing memory traffic and kernel overhead relative to sequential implementations. A lightweight head-wise routing module selects the sparse or linear path per head via differentiable routing during training and hard routing at inference. Relative to hybrid attention methods such as SALAD, SLA, and VMonarch, SparseFlow's contribution lies in explicitly coupling a video-diffusion-aware hybrid operator with a fused kernel specialized for that operator. The work provides method formulation, complexity analysis, approximation and stability discussion, and simulation-based results for FLOPs, memory, and throughput, alongside analysis of observed trade-offs. Existing OSF archival DOI: 10.17605/OSF.IO/6EMFW; Existing OSF archival page: https://osf.io/6emfw/. Files include the technical report PDF and the LaTeX source tarball when available.
Building similarity graph...
Analyzing shared references across papers
Loading...
Haopeng Jin (Mon,) studied this question.
www.synapsesocial.com/papers/69ec5b2388ba6daa22dacbc0 — DOI: https://doi.org/10.5281/zenodo.19712508
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Haopeng Jin
Beijing University of Posts and Telecommunications
Building similarity graph...
Analyzing shared references across papers
Loading...