Lung cancer remains a leading cause of cancer-related deaths, primarily due to late-stage detection. Although medical imaging and biopsy-based evaluations have improved, early identification of lung cancer continues to be challenging. To address this, we propose a Vision Transformer (ViT)-based model for binary lung nodule classification using computed tomography (CT) images. This study uses a Kaggle-hosted subset of the LIDC–IDRI dataset containing 315 CT nodule patches, where the original malignancy scores were converted into benign and malignant binary classes. Given the small dataset size, an extensive augmentation pipeline was designed to enhance model generalization. The lightweight ViT-Small/16 architecture demonstrated strong performance, achieving 92.3% accuracy, 90.5% precision, 93.8% recall, and a 92.1% F1-score. These results highlight the potential of compact transformer models for early lung cancer identification. This work is among the first to evaluate ViT-Small/16 on a small-scale CT nodule dataset using a tailored augmentation strategy for limited-data medical imaging.
Thakur et al. (Sun,) studied this question.