What question did this study set out to answer?

This research aims to enhance emotion recognition in user-generated videos by addressing the affective domain gap and limited emotion annotations.

March 3, 2026Open Access

Building Prototype Evolution Pathway for Emotion Recognition in User-Generated Videos

Key Points

This research aims to enhance emotion recognition in user-generated videos by addressing the affective domain gap and limited emotion annotations.
Introduced a progressive prototype evolution framework for emotional representation.
Leveraged auxiliary cross-modal priors to improve unimodal emotion modeling.
Designed category-aggregated prompting and bidirectional supervision mechanisms.
Achieved state-of-the-art results on VideoEmotion-8, Ekman-6, and MusicVideo-6 datasets.
Demonstrated effectiveness of auxiliary modality priors in emotion recognition.

Abstract

Large-scale pretrained foundation models are increasingly essential for affective analysis in user-generated videos. However, current approaches typically reuse generic multi-modal representations directly with task-specific adapters learned from scratch, and their performance is limited by the large affective domain gap and scarce emotion annotations. To address these issues, we introduce a novel paradigm that leverages auxiliary cross-modal priors to enhance unimodal emotion modeling, effectively exploiting modality-shared semantics and modality-specific inductive biases. Specifically, we propose a progressive prototype evolution framework that gradually transforms a neutral prototype into discriminative emotional representations through fine-grained cross-modal interactions with visual cues. The auxiliary prior serves as a structural constraint, reframing the adaptation challenge from a difficult domain shift problem into a more tractable prototype shift within the affective space. To ensure robust prototype construction and guided evolution, we further design category-aggregated prompting and bidirectional supervision mechanisms. Extensive experiments on VideoEmotion-8, Ekman-6, and MusicVideo-6 validate the superiority of our approach, achieving state-of-the-art results and demonstrating the effectiveness of leveraging auxiliary modality priors for foundation-model-based emotion recognition.

Building Prototype Evolution Pathway for Emotion Recognition in User-Generated Videos

Key Points

Abstract

Cite This Study