What question did this study set out to answer?

The research aims to develop an effective multimodal object detection framework that minimizes redundancy and enhances feature alignment across infrared and visible modalities.

March 13, 2026Open Access

A study on infrared-visible fusion multimodal object detection algorithm based on cross-modal information bottleneck and minimum redundancy transformation

Key Points

The research aims to develop an effective multimodal object detection framework that minimizes redundancy and enhances feature alignment across infrared and visible modalities.
Proposed a multimodal fusion detection framework integrating Cross-modal Information Bottleneck and Minimum Redundancy Transformation.
Developed a compress-decompose-reconstruct pathway to improve cross-modal consistency.
Introduced sparse structural transformations to suppress modality redundancy.
Implemented a dual-phase training strategy for effective representation in both isolated and fused modalities.
Improved mean Average Precision (mAP) on the KAIST nighttime scenario from 42.8% to 44.1%.
Achieved an AP@75 of 80.0% in low-light conditions on the LLVIP dataset.
Outperformed previous state-of-the-art methods by 2.4%.
Demonstrated robust performance even under occlusion and illumination disturbances.

Abstract

Infrared-visible multimodal object detection plays a vital role in complex environmental conditions. However, existing approaches still suffer from significant limitations in modality redundancy suppression and feature alignment. To address these issues, this paper proposes a novel multimodal fusion detection framework that integrates a Cross-modal Information Bottleneck (CIB) with a Minimum Redundancy Transformation (MRT). The CIB module employs a compress-decompose-reconstruct pathway to selectively preserve shared semantics across modalities, thereby enhancing cross-modal consistency. The MRT module introduces sparse structural transformations along both channel and spatial dimensions, effectively suppressing modality redundancy and strengthening boundary-awareness for target regions. Additionally, we design a dual-phase training strategy based on modality isolation and fusion to stabilize the cooperative representation process. Extensive experiments conducted on two authoritative datasets, KAIST and LLVIP, validate the effectiveness of the proposed approach. Specifically, our method improves the mAP on the KAIST nighttime scenario from 42.8% (Baseline) to 44.1%, and achieves an AP@75 of 80.0% under low-light conditions in LLVIP, outperforming the previous state-of-the-art by 2.4%. Moreover, our method demonstrates consistent performance in robustness evaluations under occlusion and illumination disturbances, highlighting its advantages and application potential in multimodal perception.

Bookmark

View Full Paper

Cite This Study

Tan et al. (Tue,) studied this question.

synapsesocial.com/papers/69b3abc502a1e69014ccce15 https://doi.org/https://doi.org/10.1038/s41598-026-35339-2

Bookmark

View Full Paper