Background Undetected cervical lesions can progress to cancer, a leading cause of mortality among women worldwide. While automated analysis of Papanicolaou (Pap) smear images using convolutional neural networks (CNNs) has demonstrated significant potential for screening, most existing studies rely on single curated datasets. This aspect limits the understanding of model generalization to the noise and variability inherent in real-world clinical cytology. Methods We evaluated three CNN architectures (VGG16, ResNet50, and InceptionV3) across four curated Pap smear datasets using stratified 5-fold cross-validation. For each dataset, the model achieving the highest mean Macro-F1 score was selected for further analysis. To assess robustness against domain shift, we performed an external evaluation using a non-curated, Real-World dataset comprising routine clinical images. Results All architectures achieved robust performance on the curated benchmarks, with mean Macro-F1 scores ranging from 73.58% to 99.28%. However, performance dropped significantly when models were evaluated on the Real-World dataset (Macro-F1: 33.25–55.91%), highlighting the severity of the domain gap. Notably, the model trained on a combined heterogeneous dataset achieved the highest inter-domain performance, suggesting that data diversity improves robustness. Class-wise analysis revealed that high-grade lesions were most sensitive to real-world variability. Conclusions Although CNNs achieve state-of-the-art results on curated benchmarks, their direct applicability to routine cytology workflows is hindered by domain shift. Our findings emphasize that evaluating models across heterogeneous, multi-source datasets is a prerequisite for reliable clinical deployment.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sidnir Carlos Baia Ferreira
Romário Silva
Carlos André de Mattos Teixeira
PeerJ Computer Science
National Institute for Space Research
Universidade Federal do Pará
Instituto Federal de Educação, Ciência e Tecnologia do Pará
Building similarity graph...
Analyzing shared references across papers
Loading...
Ferreira et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69b606ea83145bc643d1d55d — DOI: https://doi.org/10.7717/peerj-cs.3708
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: