Self-supervised contrastive learning has become an effective approach for visual representation learning when large-scale annotation is impractical. In this study, we evaluate three widely used methods—SimCLR, MoCo v2, and BYOL—for large-scale stock keeping unit (SKU) recognition in retail environments. Experiments are conducted on the RP2K benchmark and a domain-specific in-house dataset (InSKU) using both linear probing and full fine-tuning. Under the original RP2K configuration with extended self-supervised pre-training, SimCLR achieves the highest Top-1 accuracy under linear evaluation (94.98%). In contrast, BYOL attains the highest performance under full fine-tuning (99.22% Top-1 accuracy). After filtering and deduplicating the dataset to reduce class imbalance and near-duplicate samples, MoCo v2 achieves competitive, and in some cases superior, linear performance under a reduced training budget. Cross-domain evaluation on InSKU indicates that SimCLR generalises more effectively under frozen-encoder constraints, whereas BYOL and MoCo v2 require full adaptation. These results highlight the sensitivity of contrastive representations to dataset composition, optimisation regime, and domain shift, providing practical guidance for deployment in dynamic retail settings.
Building similarity graph...
Analyzing shared references across papers
Loading...
Wiktor Kępiński
Grzegorz Sarwas
Applied Sciences
Warsaw University of Technology
Omnikon (Poland)
Building similarity graph...
Analyzing shared references across papers
Loading...
Kępiński et al. (Sat,) studied this question.
www.synapsesocial.com/papers/69ba422e4e9516ffd37a2211 — DOI: https://doi.org/10.3390/app16062810