Ion-adsorption rare-earth mining in southern China often leaves small, fragmented disturbances in rugged, forested terrain, making UAV-based enforcement challenging due to confusion with bare ground, canopy gaps, and shadows. We propose YOLO11-MSCAM, an enhanced YOLO11vm detector in which the original SPPF at the backbone–neck junction is replaced by a Multi-Scale Convolution–Attention Module that cascades channel attention, spatial attention, and multi-scale residual convolutions to enhance context aggregation and suppress background clutter. We build a field-acquired UAV dataset, SIMA (0.05 m GSD; September–November 2023), generating 1630 non-overlapping 640 × 640 orthomosaic tiles split into 1320/147/163 for training/validation/testing; five-lens raw images (nadir + oblique) are additionally used as auxiliary training samples and for post-detection verification. On the test set, YOLO11-MSCAM achieves mAP@0.5 = 83.24%, mAP@0.5:0.95 = 58.29%, and F1 = 79.92%, outperforming YOLOv11m and other detectors (YOLOv5m/6m/8m/9m/10m and Faster R-CNN with ResNet-50). With 19.67 M parameters, 67.34 GFLOPs@640, and 45.86 FPS, it supports tile-based batch screening to prioritize suspicious sites for field checks and evidence collection.
Li et al. (Sat,) studied this question.