What question did this study set out to answer?

The study aims to enhance multimodal sarcasm detection by addressing both inter-modal and intra-modal incongruities.

March 14, 2026Open Access

Granularity-guided fusion for multi-modal sentiment understanding

Key Points

The study aims to enhance multimodal sarcasm detection by addressing both inter-modal and intra-modal incongruities.
Proposed Granularity-Based Inter and Intra-Modal Fusion Network (GIIFN)
Utilized pre-trained visual and language models to extract semantic features from images and text
Introduced a learnable granularity grouping module for adaptive feature partitioning
Designed a bidirectional cross-attention mechanism to fuse features at each granularity level
Achieved state-of-the-art performance in multimodal sarcasm detection
Enhanced effective modeling of intra-modal semantic information
Improved handling of incongruity in both visual and textual modalities

Abstract

Multimodal sarcasm detection involves identifying sarcasm across multiple modalities, with the key challenge being modeling incongruity within and between modalities. Current methods often focus on inter-modal incongruity while underexploring intra-modal semantic information. To address this, we propose the Granularity-Based Inter and Intra-Modal Fusion Network (GIIFN). We leverage pre-trained visual and language models to extract semantic features from images and text, and introduce a learnable granularity grouping module to adaptively partition features into multiple semantic granularities. Furthermore, we design a bidirectional cross-attention mechanism to fuse intra-modal and inter-modal features at each granularity level. Experiments demonstrate that our approach achieves state-of-the-art performance.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Mingxuan Chen

China Meteorological Administration

Huarong Tang

Chen Sun

Journals

Scientific Reports

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Granularity-guided fusion for multi-modal sentiment understanding

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study