Distant metastasis is the leading cause of death in renal cell carcinoma (RCC), yet accurate prediction tools remain lacking. We aimed to develop and validate a machine learning model to predict synchronous distant metastasis—defined as metastasis present at the time of initial diagnosis—in RCC. We identified 106,448 RCC patients from the SEER database (2010–2020), divided into training (n = 52,368), internal validation (n = 22,444), and external validation (n = 31,636) cohorts. Nine machine learning algorithms were compared using area under the curve (AUC), calibration, and decision curve analysis. Model interpretability was assessed using SHAP analysis. Distant metastasis was present in 10.7% of patients. N1 stage (OR 8.28–9.46), tumor size > 7 cm (OR 6.30–7.72), T4 stage (OR 6.29–8.51), and sarcomatoid histology (OR 1.89–3.28) were independent risk factors, while chromophobe histology (OR 0.04–0.11) and multifocal tumors (OR 0.40–0.53) were protective. Gradient Boosting achieved AUCs of 0.906, 0.906, and 0.926 in training, internal, and external validation cohorts, respectively. The model demonstrated good calibration and clinical utility across threshold probabilities of 5%–80%. Performance remained stable across subgroups and sensitivity analyses. A web calculator was developed (https://952307952pxw.shinyapps.io/RCC-Calculator/). We developed a machine learning model that accurately predicts synchronous distant metastasis at the time of initial RCC diagnosis, with good generalizability. The online calculator may assist clinicians in risk stratification and individualized decision-making.
Cheng et al. (Mon,) studied this question.