Creating high-fidelity, coherent long videos is a sought-after aspiration. While recent video diffusion models have shown promising potential, they still grapple with spatiotemporal inconsistencies and high computational resource demands. We propose Global-Local Collaborative Diffusion (GLC-Diffusion), a tuning-free method for long video generation. It models the long video denoising process by establishing denoising trajectories through Global-Local Collaborative Denoising (GLCD) to ensure overall content consistency and temporal coherence between frames. Additionally, we introduce a Noise Reinitialization strategy which combines local noise shuffling with frequency fusion to improve global content consistency and visual diversity. Further, we propose a Video Motion Consistency Refinement (VMCR) module that computes the gradient of pixel-wise and frequency-wise losses to enhance visual consistency and temporal smoothness. Extensive experiments, including quantitative and qualitative evaluations on videos of varying lengths ( e.g. , 3× and 6× longer), demonstrate that our method effectively integrates with existing video diffusion models, producing coherent, high-fidelity long videos superior to previous approaches.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ma et al. (Wed,) studied this question.
www.synapsesocial.com/papers/698d6eca5be6419ac0d54a98 — DOI: https://doi.org/10.1145/3794855
Yi Ma
Jianzhong Chen
Donglin Di
ACM Transactions on Multimedia Computing Communications and Applications
UNSW Sydney
University of Science and Technology of China
Zhejiang University
Building similarity graph...
Analyzing shared references across papers
Loading...