Knee osteoarthritis (KOA) remains the most prevalent form of osteoarthritis and a major cause of global disability. The Kellgren–Lawrence (KL) grading system, though widely used, suffers from inter- and intra-observer variability, especially in early disease stages. Artificial intelligence (AI) offers a transformative approach to automate KL grading on plain radiographs, providing consistent, reproducible, and scalable diagnostic solutions. This narrative review synthesizes recent advances in AI-based KL grading models, focusing on methodological frameworks, performance, clinical applicability, and limitations. Narrative review of peer-reviewed studies applying AI-based methods for KL grading of KOA on radiographic images. Literature search was conducted across PubMed, Embase, Web of Science, and Google Scholar to identify studies published between 2016 and 2025. Eligible studies satisfied predefined selection criteria, applied AI-based methods to radiographic grading of KOA. The review focused on model architectures, dataset characteristics, validation strategies, performance metrics, and comparisons with expert radiographic assessment. Eighteen eligible studies were included. Convolutional neural networks (CNN) remain the core of automated KL grading, evolving from standard classification models to ensemble and ordinal regression frameworks. Model performance was evaluated against expert-assigned KL grades as reference standard, with reported accuracies ranging from 75% to 98% and area under the curve values up to 0.98. Agreement with expert annotations, Cohen’s kappa (κ), ranged from 0.67 to 0.86. Deep Siamese networks, Faster R-CNNs, and ensemble frameworks have enhanced localization of KOA radiographic features, thereby interpretability relative to human radiologic assessment. Ordinal regression and attention-based visualization (saliency and class activation mappings) reduced misclassification between adjacent KL grades. Persistent challenges included subjective ground-truth labeling, dataset imbalance particularly under-representation of early (KL 0–1) and severe (KL 4) disease, and limited external validation. Models trained primarily on Osteoarthritis Initiative and Multicenter Osteoarthritis Study datasets showed reduced generalizability on external hospital datasets. AI-driven KL grading demonstrates near-human accuracy and strong promise for clinical integration. However, addressing labeling subjectivity, dataset diversity, and explainability remains essential for trustworthy deployment. While KL grading is inherently radiograph-based, integration of clinical metadata and longitudinal radiographic data may support more robust disease characterization. Federated learning frameworks offer a pathway to improve generalizability while preserving data privacy.
Building similarity graph...
Analyzing shared references across papers
Loading...
Saumya Rawat
Ved Chaturvedi
Binit Vaidya
SHILAP Revista de lepidopterología
Building similarity graph...
Analyzing shared references across papers
Loading...
Rawat et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69f6e5868071d4f1bdfc63b2 — DOI: https://doi.org/10.1177/1759720x261442408