What question did this study set out to answer?

The research aims to develop a method for generating high-fidelity long videos while overcoming computational challenges.

February 12, 2026

Tuning-Free Long Video Generation via Global-Local Collaborative Diffusion

Key Points

The research aims to develop a method for generating high-fidelity long videos while overcoming computational challenges.
Proposed Global-Local Collaborative Diffusion for denoising long videos.
Implemented Noise Reinitialization for enhanced global content consistency.
Developed Video Motion Consistency Refinement module to improve visual and temporal coherence.
Conducted extensive quantitative and qualitative evaluations on varying video lengths.
GLC-Diffusion effectively integrates with existing video diffusion models.
Produced coherent long videos that outperform previous methods.
Demonstrated high fidelity and improved visual diversity in generated videos.

Abstract

Creating high-fidelity, coherent long videos is a sought-after aspiration. While recent video diffusion models have shown promising potential, they still grapple with spatiotemporal inconsistencies and high computational resource demands. We propose Global-Local Collaborative Diffusion (GLC-Diffusion), a tuning-free method for long video generation. It models the long video denoising process by establishing denoising trajectories through Global-Local Collaborative Denoising (GLCD) to ensure overall content consistency and temporal coherence between frames. Additionally, we introduce a Noise Reinitialization strategy which combines local noise shuffling with frequency fusion to improve global content consistency and visual diversity. Further, we propose a Video Motion Consistency Refinement (VMCR) module that computes the gradient of pixel-wise and frequency-wise losses to enhance visual consistency and temporal smoothness. Extensive experiments, including quantitative and qualitative evaluations on videos of varying lengths ( e.g. , 3× and 6× longer), demonstrate that our method effectively integrates with existing video diffusion models, producing coherent, high-fidelity long videos superior to previous approaches.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Ma et al. (Wed,) studied this question.

www.synapsesocial.com/papers/698d6eca5be6419ac0d54a98 — DOI: https://doi.org/10.1145/3794855

Authors

Yi Ma

Jianzhong Chen

Donglin Di

Journals

ACM Transactions on Multimedia Computing Communications and Applications

Actions

Institutions

UNSW Sydney

University of Science and Technology of China

Zhejiang University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Tuning-Free Long Video Generation via Global-Local Collaborative Diffusion

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion