What type of study is this?

This is a Quantitative Study study.

September 29, 2025Open Access

VORTA: Efficient Video Diffusion via Routing Sparse Attention

Key Points

VORTA achieves a 1.76× speedup in video generation without sacrificing quality, enhancing practical applications of VDiTs.
The method features a novel sparse attention mechanism that optimally captures long-range dependencies in video sequences.
An adaptive routing strategy replaces full 3D attention with focused sparse attention variants to eliminate redundancies.
Integration with other acceleration methods can lead to up to 14.41× speedup with minor declines in performance.

Abstract

Video Diffusion Transformers (VDiTs) have achieved remarkable progress in high-quality video generation, but remain computationally expensive due to the quadratic complexity of attention over high-dimensional video sequences. Recent attention acceleration methods leverage the sparsity of attention patterns to improve efficiency; however, they often overlook inefficiencies of redundant long-range interactions. To address this problem, we propose VORTA, an acceleration framework with two novel components: 1) a sparse attention mechanism that efficiently captures long-range dependencies, and 2) a routing strategy that adaptively replaces full 3D attention with specialized sparse attention variants throughout the sampling process. It achieves a 1. 76 end-to-end speedup without quality loss on VBench. Furthermore, VORTA can seamlessly integrate with various other acceleration methods, such as caching and step distillation, reaching up to 14. 41 speedup with negligible performance degradation. VORTA demonstrates its efficiency and enhances the practicality of VDiTs in real-world settings.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Sun et al. (Sat,) studied this question.

www.synapsesocial.com/papers/68da58d8c1728099cfd111e7 — DOI: https://doi.org/10.48550/arxiv.2505.18809

Authors

Wenhao Sun

Rong-Cheng Tu

Yifu Ding

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

VORTA: Efficient Video Diffusion via Routing Sparse Attention

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion