June 14, 2024Open Access

利用多模态查询实现视频事件定位

Key Points

Key points are not available for this paper at this time.

Abstract

视频理解是数字时代的关键任务，然而视频的动态和多事件特性使其处理工作量大且计算需求高。因此，基于语义查询定位特定事件在面向用户的视频搜索和视频基础模型学术研究中变得日益重要。目前研究中的一个重大局限是语义查询通常采用描述目标事件语义的自然语言。这种设置忽略了由图像和文本组成的多模态语义查询的潜力。为填补这一空白，我们提出了一个新的基准ICQ，用于通过多模态查询定位视频中的事件，并发布了新的评估数据集ICQ-Highlight。我们的新基准旨在评估模型在给定包含事件参照图像和用于调整图像语义的补充文本的多模态语义查询时的定位能力。为系统性评估模型性能，我们设计了4种风格的参照图像和5种类型的补充文本，覆盖不同领域。我们提出了3种适配方法，将现有模型调整至我们的新设置，并评测了10个从专业模型到大规模基础模型的最新技术。我们认为该基准是探讨视频事件定位中多模态查询的初步尝试。

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Zhang等人（星期五）研究了这一问题。

www.synapsesocial.com/papers/68e64d66b6db6435875ddb83 — DOI: https://doi.org/10.48550/arxiv.2406.10079

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Transferable Video Moment Localization by Moment-Guided Query Prompting· 2024 · 1 citations
Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval· 2025
A Survey of Video Datasets for Grounded Event Understanding· 2024
Towards Event-oriented Long Video Understanding· 2024
Text-Video Retrieval with Global-Local Semantic Consistent Learning· 2024 · 2 citations

Authors

Gengyuan Zhang

Mang Ling Ada Fok

Yan Xia

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

利用多模态查询实现视频事件定位

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion