Unsupervised visible infrared person reidentification (USVI-ReID) is a challenging retrieval task that retrieves cross-modality pedestrian images without using any label information. In this task, the large cross-modality variance makes it difficult to generate reliable cross-modality labels, and the lack of annotations also provides additional difficulties for learning modality-invariant features. To facilitate this unsupervised cross-modal learning, we begin by leveraging the information contained in the cross-modality input and its predicted label. Aiming to minimize information loss, we optimize the model by incorporating entropy minimization, uniform label distribution, and cross-modality matching. In our approach, we design a loop iterative training strategy alternating between model training and cross-modality matching, where a uniform prior guided optimal transport assignment is proposed to select matched visible and infrared prototypes. This matching information is then utilized to minimize the intra- and cross-modality entropy. As a result, our model can gradually self-learn useful information, enabling it to generate discriminative representations for unlabeled cross-modal data. Extensive experimental results on benchmarks demonstrate the effectiveness of our method, e.g., 69.4% and 89.4% of Rank-1 accuracy on SYSU-MM01 and RegDB without any annotations. The code will be released soon.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zheng Zhang
Jiaqi Chen
Xin Tan
IEEE Transactions on Image Processing
East China Normal University
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69d895206c1944d70ce06186 — DOI: https://doi.org/10.1109/tip.2026.3680065