What type of study is this?

This is a Quantitative Study study.

October 20, 2025Open Access

NeMo: Needle in a Montage for Video-Language Understanding

Key Points

NeMo's innovative benchmark utilizes 31,378 automatically generated question-answer pairs from 13,486 videos.
The automated data generation pipeline produced high-quality evaluation data, indicating robust reasoning capabilities.
20 advanced models were evaluated on NeMoBench, revealing their strengths and limitations in video-language tasks.
The NeMo project provides continuous updates, adapting the benchmark with the latest video content for ongoing relevance.

Abstract

Recent advances in video large language models (VideoLLMs) call for new evaluation protocols and benchmarks for complex temporal reasoning in video-language understanding. Inspired by the needle in a haystack test widely used by LLMs, we introduce a novel task of Needle in a Montage (NeMo), designed to assess VideoLLMs' critical reasoning capabilities, including long-context recall and temporal grounding. To generate video question answering data for our task, we develop a scalable automated data generation pipeline that facilitates high-quality data synthesis. Built upon the proposed pipeline, we present NeMoBench, a video-language benchmark centered on our task. Specifically, our full set of NeMoBench features 31,378 automatically generated question-answer (QA) pairs from 13,486 videos with various durations ranging from seconds to hours. Experiments demonstrate that our pipeline can reliably and automatically generate high-quality evaluation data, enabling NeMoBench to be continuously updated with the latest videos. We evaluate 20 state-of-the-art models on our benchmark, providing extensive results and key insights into their capabilities and limitations. Our project page is available at: https://lavi-lab.github.io/NeMoBench.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Hu et al. (Mon,) studied this question.

www.synapsesocial.com/papers/68f5fcce8d54a28a75cf1c5e — DOI: https://doi.org/10.48550/arxiv.2509.24563

Authors

Zi-Yuan Hu

Shuo Liang

Dan Zheng

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

NeMo: Needle in a Montage for Video-Language Understanding

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion