August 22, 2024Open Access

Adapting CLIP for Action Recognition via Dual Semantic Supervision and Temporal Prompt Reparameterization

Key Points

Key points are not available for this paper at this time.

Abstract

The contrastive vision–language pre-trained model CLIP, driven by large-scale open-vocabulary image–text pairs, has recently demonstrated remarkable zero-shot generalization capabilities in diverse downstream image tasks, which has made numerous models dominated by the “image pre-training followed by fine-tuning” paradigm exhibit promising results on standard video benchmarks. However, as models scale up, full fine-tuning adaptive strategy for specific tasks becomes difficult in terms of training and storage. In this work, we propose a novel method that adapts CLIP to the video domain for efficient recognition without destroying the original pre-trained parameters. Specifically, we introduce temporal prompts to realize the object of reasoning about the dynamic content of videos for pre-trained models that lack temporal cues. Then, by replacing the direct learning style of prompt vectors with a lightweight reparameterization encoder, the model can be adapted to domain-specific adjustment to learn more generalizable representations. Furthermore, we predefine a Chinese label dictionary to enhance video representation by co-supervision of Chinese and English semantics. Extensive experiments on video action recognition benchmarks show that our method achieves competitive or even better performance than most existing methods with fewer trainable parameters in both general and few-shot recognition scenarios.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Lujuan Deng

Jieqing Tan

Fangmei Liu

Journals

Electronics

Actions

Institutions

Zhengzhou University of Light Industry

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Adapting CLIP for Action Recognition via Dual Semantic Supervision and Temporal Prompt Reparameterization

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider