Abstract Introduction Vision transformer (ViT) is a recent development in the world of deep learning models (AI) and is an alternative to existing convolutional neural networks. ViTs aim to classify, detect and segment images. Diabetic foot disease (DFD) is a complex disease and is associated with lower limb amputation. Magnetic resonance imaging (MRI) is commonly used in patients with DFD. Aims To explore the use of deep learning models to identify predictors on MRI for limb loss in patients with DFD by using sarcoepenia as a potential surrogate marker. Methods Two-dimensional images of the foot at the base of the 1st metatarsal were classified as having mild, moderate or severe sarcopenia. A subset of 50 images were also graded by a musculoskeletal radiologist to establish the inter-rater reliability. 824 images were annotated. Following data pre-processing and data augmentation, 1740 images were available for the deep learning models which were split into a 70:20:10 ratio for training:validation:testing. A ViT model was applied to classify the images as mild, moderate or severe. Results The inter-rater reliability was was 0.827 95% c.i. 0.726–0.928; P-value 0.001. ViT had an accuracy of 78.7% with an F1 score of 79.9% in classifying sarcopenia severity on two-dimensional MRI pictures. The model had a high precision (81.5%) and recall (78.7%). The confidence threshold could be set to 51% without any deterioration in the model's performance. Conclusions ViT is a useful deep learning model in classifying the severity of sarcopenia in patients with diabetic foot disease.
Ahmad et al. (Sun,) studied this question.