What question did this study set out to answer?

The aim is to improve multimodal sentiment analysis performance by addressing modality inconsistency and enhancing feature representation.

June 4, 2026Open Access

ECA-CMF Net: An ECA-Enhanced Conditional Modulation Fusion Network for Multimodal Sentiment Analysis

Key Points

The aim is to improve multimodal sentiment analysis performance by addressing modality inconsistency and enhancing feature representation.
Proposed ECA-CMF Net integrates unified preprocessing and heterogeneous feature extraction.
ECA adaptively recalibrates relevant channels to suppress noise and enhance features.
Conditional Modulation Fusion generates dynamic modulation parameters for cross-modal interaction.
Achieved ACC/F1 scores of 0.8874/0.8870 on CMU-MOSI and 0.7089/0.7008 on CMU-MOSEI.
Improved ACC/F1 by 3.40/3.38 points on CMU-MOSI and 1.53/1.85 points on CMU-MOSEI compared to baselines.
Demonstrated enhanced multimodal collaboration and adaptive fusion.

Abstract

To address modality inconsistency, insufficient intra-modal affective representation, and the limited adaptability of conventional fusion strategies in multimodal sentiment analysis, this study proposes ECA-CMF Net, an Efficient Channel Attention-enhanced Conditional Modulation Fusion network. The framework integrates unified indexing-based preprocessing, heterogeneous feature extraction with ECA, and Conditional Modulation Fusion to improve multimodal representation learning and sentiment classification. Specifically, sample-level alignment and modality-specific standardisation are first applied to textual, visual, and acoustic inputs to reduce distribution shifts and noise interference. Then, heterogeneous encoders extract modality-specific features, while ECA adaptively recalibrates sentiment-relevant channels and suppresses redundant information. Finally, the CMF mechanism generates modulation parameters from joint multimodal context to scale and shift modality features, enabling dynamic cross-modal interaction and contribution adjustment. Experiments on CMU-MOSI and CMU-MOSEI show that ECA-CMF Net achieves ACC/F1 scores of 0.8874/0.8870 and 0.7089/0.7008, respectively. Compared with the strongest reproduced baselines, it improves ACC/F1 by 3.40/3.38 percentage points on CMU-MOSI and 1.53/1.85 percentage points on CMU-MOSEI, demonstrating improved multimodal collaboration, adaptive fusion, and robustness.

ECA-CMF Net: An ECA-Enhanced Conditional Modulation Fusion Network for Multimodal Sentiment Analysis

Key Points

Abstract

Cite This Study