AI models show potential to improve breast cancer screening, however detailed subgroup evaluations to uncover the strengths and weaknesses of models are lacking. This study presents a granular evaluation of a commercial AI model for cancer detection on digital breast tomosynthesis (DBT) on a retrospective cohort of 167,860 screening exams in female patients. Performance in distinguishing screen detected cancers (1,368 exams) from negative exams (166,387 exams) is stratified across demographic, imaging, and pathologic subgroups to identify disparities. The overall AUROC is 0.91 and sensitivity is 0.73 with robust performance across demographics. In-situ cancers (AUROC: 0.85, sensitivity: 0.55), calcifications (AUROC: 0.80, sensitivity: 0.66), and dense breast tissue (AUROC: 0.88, sensitivity: 0.63) are associated with lower performance, while masses (AUROC: 0.93, sensitivity: 0.85) and architectural distortions (AUROC: 0.90, sensitivity: 0.83) are associated with higher performance. These results highlight the need for detailed evaluations and vigilance in adopting new clinical tools.
Building similarity graph...
Analyzing shared references across papers
Loading...
Beatrice Brown-Mulry
Rohan Isaac
Sang Mook Lee Lee
Nature Communications
Emory University
Clemson University
Building similarity graph...
Analyzing shared references across papers
Loading...
Brown-Mulry et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69bf86ecf665edcd009e916f — DOI: https://doi.org/10.1038/s41467-026-70637-3