ABSTRACT Underwater object detection remains challenging due to complex backgrounds, large‐scale variations, and the need for real‐time processing under limited computational resources. Conventional two‐stage and one‐stage object detectors rely on non‐maximum suppression (NMS), while query‐based end‐to‐end detectors employ transformer architectures, both of which are not well‐suited for deployment on resource‐constrained underwater embedded devices. To address these challenges, we present ACFD‐Net, a YOLO26‐based real‐time detection framework tailored to underwater scenarios. Specifically, a complementary split efficient downsampling module is introduced to reduce redundancy while preserving essential contextual and local features. Subsequently, an Efficient Channel Attention Mechanism (ECAM) was introduced to model directional receptive fields and fuse local residual features, enhancing the discrimination of low‐contrast underwater targets while suppressing background noise. More importantly, we propose an attention‐guided cosine feature distillation strategy. In contrast to conventional layer‐wise feature distillation, our method aligns the ECAM‐generated attention representations between the teacher and student networks in terms of cosine similarity, thereby enabling more effective knowledge transfer. Experimental results on the DUO dataset show that ACFD‐Net improves mAP50 by 4.3% over the baseline while reducing GFLOPs from 6.1 to 5.6, and achieves a real‐time inference speed of 155 FPS. Experiments on the URPC2020 dataset further demonstrate its strong generalization capability. Overall, ACFD‐Net achieves an effective balance between detection accuracy and computational efficiency for real‐time underwater object detection.
An et al. (Mon,) studied this question.