Uterine fibroids are common benign tumors that originate from the smooth muscle layer of the uterus and are often associated with reproductive complications. Early identification is important for timely management and the prevention of complications such as infertility. However, diagnosing fibroids can be challenging because their symptoms often overlap with other gynecological conditions, such as adenomyosis and ovarian cysts, which may lead to delayed diagnosis or misdiagnosis. In this study, we propose a two-stage hierarchical deep learning framework that combines a Detection Transformer (DETR) for fibroid localization with an ensemble of Vision Transformers for classification using ultrasound images. The dataset was first divided at the image level prior to patch extraction, after which fibroid and normal tissue patches were generated from each subset to support model training and evaluation. This two-stage approach enables improved localization while addressing class imbalance and preserving relevant contextual information in ultrasound images. The proposed framework achieved high performance on the available dataset, with an accuracy of 99.42%, a precision of 99.10%, a recall of 99.36%, an F1 score of 99.42%, and an area under the curve (AUC) of 99.9%. However, the dataset contains a limited number of normal cases and does not include patient-level identifiers, which may affect the generalizability of the results. Therefore, further validation on larger and more diverse clinical datasets is necessary before clinical deployment.
Ogie et al. (Fri,) studied this question.