March 3, 2026

Cross-modal spatio-temporal fusion weakly supervised video anomaly detection based on large-scale vision-language models

Video anomaly detection significantly improves using spatio-temporal fusion techniques, enhancing detection accuracy.
The analysis leverages large-scale vision-language models to achieve effectiveness in detecting anomalies.
Weakly supervised methods are utilized to reduce the need for extensive labeled datasets, making the approach more accessible.
This technique may enable better surveillance systems and automated monitoring in various domains, enhancing safety and security.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Lihu Pan

Shouxin Peng

Rui Zhang

Multimedia Systems

National Tsing Hua University

Taiyuan University of Science and Technology

Building similarity graph...

Analyzing shared references across papers