ABSTRACT Multimodal sentiment analysis (MSA) has recently encountered two major challenges: non‐textual modalities are often affected by noise, and sentiment intensity differences are difficult to capture. To address these issues, we propose a Sentiment Intensity Contrastive Text‐Enhanced Fusion Network (SICTEF Net), which achieves deep collaboration among text, audio, and visual modalities through three key mechanisms. First, a grouped‐channel‐attention based Feature Enhancement Module (EMA) is designed to mitigate modality‐specific noise and emphasize emotion‐sensitive cues by combining spatial–channel interaction mapping with dual‐branch attention fusion. Second, a text‐centered cross‐modal fusion mechanism is introduced, where bidirectional multi‐head self‐attention and a residual‐enhanced encoder jointly enable complementary mappings between text and non‐text modalities, thereby producing intermediate representations that preserve semantic primacy while incorporating fine‐grained complementary information. Third, a sentiment‐intensity weighted contrastive learning strategy dynamically assigns weights to positive and negative sample pairs according to their sentiment intensity differences, allowing the model to more precisely distinguish samples with varying degrees of similarity in the embedding space. Experimental evaluation on the CMU‐MOSI and CMU‐MOSEI datasets demonstrates that SICTEF Net consistently outperforms state‐of‐the‐art baselines in binary accuracy, F1 score, seven‐class accuracy, mean absolute error (MAE), and Pearson correlation. Comprehensive ablation studies further confirm the complementary benefits of EMA, the text‐enhanced Transformer, and sentiment‐intensity contrastive learning. These results indicate that combining text‐driven deep interaction, non‐text modality enhancement via channel attention, and contrastive learning can improve the accuracy and robustness of multimodal sentiment analysis.
Building similarity graph...
Analyzing shared references across papers
Loading...
Heng Jiang
Lianke Shi
Deyu Kong
Concurrency and Computation Practice and Experience
Henan University
Building similarity graph...
Analyzing shared references across papers
Loading...
Jiang et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69a75cfbc6e9836116a264ff — DOI: https://doi.org/10.1002/cpe.70593