Diffusion-based or flow-based models have achieved significant progress in video synthesis but require multiple iterative sampling steps, which incurs substantial computational overhead. While many distillation methods that are solely based on trajectory-preserving or distribution-matching have been developed to accelerate video generation models, these approaches often suffer from performance breakdown or increased artifacts under few-step settings. To address these limitations, we propose SwiftVideo, a unified and stable distillation framework that combines the advantages of trajectory-preserving and distribution-matching strategies. Our approach introduces continuous-time consistency distillation to ensure precise preservation of ODE trajectories. Subsequently, we propose a dual-perspective alignment that includes distribution alignment between synthetic and real data along with trajectory alignment across different inference steps. Our method maintains high-quality video generation while substantially reducing the number of inference steps. Quantitative evaluations on the OpenVid-1M benchmark demonstrate that our method significantly outperforms existing approaches in few-step video generation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sun et al. (Fri,) studied this question.
www.synapsesocial.com/papers/68f10ecee6a12fd042899a73 — DOI: https://doi.org/10.48550/arxiv.2508.06082
Yanxiao Sun
Jiafu Wu
Yun Cao
Building similarity graph...
Analyzing shared references across papers
Loading...