With the rapid advancement of video generation models such as Sora, video quality assessment (VQA) is becoming increasingly crucial for selecting high-quality videos from large-scale datasets used in pre-training. Traditional VQA methods, typically producing single numerical scores, often lack comprehensiveness and interpretability. To address these challenges, we introduce MVQA-68K, a novel multi-dimensional VQA dataset comprising over 68,000 carefully annotated videos, covering seven essential quality dimensions: overall aesthetics, camera movement, dynamic degree, texture detail, composition, visual quality, and factual consistency. Each annotation includes detailed chain-of-thought reasoning to facilitate interpretability and comprehensive understanding. Extensive experiments demonstrate that MVQA-68K significantly enhances the performance of various multimodal large language models (MLLMs) on the VQA task, achieving state-of-the-art results not only on our internal test set (Fig.1) but also on public benchmarks including LSVQ-test, LSVQ-1080p, and LIVE-VQC. Meantime, incorporating explicit reasoning process during VQA training substantially boosts the zero-shot generalization. Code and dataset will be available at github: https://github.com/Controller01-ai/MVQA-68K
Building similarity graph...
Analyzing shared references across papers
Loading...
Pu et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68ecfebf950606aabec09528 — DOI: https://doi.org/10.48550/arxiv.2509.11589
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Yanyun Pu
Kun Li
Zeyi Huang
Building similarity graph...
Analyzing shared references across papers
Loading...