Histopathological assessment of tissue biopsies is the main way to diagnose breast cancer. The current truth is that interpreting histopathological images is subjective and typically requires a great deal of effort by busy pathologists. Deep learning has transformed the practice of Digital Pathology, but at this moment, there is no universal agreement on which architecture gives the best performance for multiclass tissue recognition. The goal of this work is to analyze and compare the traditional Convolutional Neural Network (CNN), ResNet-101 and DenseNet-169, to the recently developed Transformer architecture, the Vision Transformer (ViT), by using a systematic benchmarking approach. Our approach involved using a balanced dataset with images from four classes (Benign, InSitu, Invasive, and Normal) and included preparation of images to a standardized input size of 224x224, transfer learning, and standard augmentations. Experimental results indicated that DenseNet-169 performs significantly greater than ResNet-101 (75% accuracy) with an improved accuracy of 96.25% and F1-score of 0.9628 at comparatively low levels of computational power (67.169 GFLOPs). DeiT Base is also an effective diagnostic adjunct, but due to its extensive number of parameters (85.80M) and computational cost, there are clear advantages in using optimized dense CNN architectures in limited clinical resources.
Eryuksel et al. (Tue,) studied this question.