As intelligent surveillance systems have found wide use in the fields of public safety and urban administration, the lack of sufficient detection accuracy due to low resolution, noise interference and dense target scenes has been a significant issue against the current surveillance image analysis techniques. Traditional target detection and image enhancement methods have limitations in small target recognition, occlusion region processing, and detail restoration, making it difficult to meet the real-time and high-precision requirements of dense surveillance environments. To address these issues, this paper proposes a joint method for dense target detection and image sharpening based on Transformer. This method extracts global features through a shared encoder and introduces a query-driven target detection branch to achieve accurate localization of overlapping targets; simultaneously, an image sharpening branch is designed to restore details of low-quality images through residual fusion, structure preservation constraints, and perceptual loss. The detection and sharpening branches are jointly trained through multi-task collaborative optimization to achieve complementary enhancement of the feature space. Experimental results show that, in low-light, high-density scenes, compared with the single-task baseline, the proposed method improves the mean average precision (mAP) from 62.4% to 75.2% and the recall from 68.5% to 81.3%. The peak signal-to-noise ratio (PSNR) for image sharpening increases from 24.8 dB to 28.5 dB, the structural similarity index (SSIM) from 0.78 to 0.86, and the structure preservation error decreases from 0.092 to 0.045, significantly enhancing the ability to recover details of target edges and textures in low-quality images. This method effectively solves the problems of inaccurate target localization and missing detail information in low-resolution dense surveillance images, providing a feasible solution for high-precision target recognition and image enhancement in intelligent surveillance systems.
Tian et al. (Thu,) studied this question.