Remote sensing target detection benefits from multimodal data RGB, infrared (IR), and synthetic aperture radar (SAR) yet most unimodal systems struggle with noise, occlusion, and low-visibility conditions, creating a performance gap in complex scenes. To address these limitations, the research introduces a Scalable Penguin with Attention-Intelligent Deep Neural Network (SP-Att-IDeepNet), designed to handle cross-modal inconsistencies and strengthen feature learning. Contrast enhancement using histogram equalization is applied to IR and SAR inputs, and a modified ResNet-50 backbone extracts unified semantic representations. The framework combines the global search ability of the SP optimizer with attention-driven deep feature refinement, improving convergence stability and detection reliability. Experimental evaluation on synchronized RGB–IR–SAR datasets demonstrate strong performance, achieving 97.03% mAP@0.5, 93.07% precision, 94.52% recall, and 96% F1-score, surpassing existing approaches. The model’s scalability and robustness position it well for real-time use in surveillance, disaster assessment, and environmental monitoring.
S Z Zhang (Thu,) studied this question.