Abstract Background Autosomal dominant polycystic kidney disease (ADPKD), characterized by progressive cyst growth and renal decline, is the leading genetic cause of end‐stage renal disease. Objective This study aims to develop and validate machine learning (ML) models for predicting the risk of progression to dialysis in patients with ADPKD using a nationwide administrative database. Early identification of high-risk patients is critical for timely monitoring. Methods This retrospective cohort study used data from Taiwan’s National Health Insurance Research Database (2007‐2018) to identify newly diagnosed patients with ADPKD. Six ML algorithms, including logistic regression, random forest, and extreme gradient boosting (XGBoost), were employed to predict progression to dialysis. Models were developed using 10-fold cross-validation, with the Synthetic Minority Oversampling Technique applied within training folds to address class imbalance. An ensemble-based feature selection strategy was implemented to identify the most robust predictors and optimize final model performance. Model evaluation was conducted using a strict temporal split. Results The study included 1856 patients with ADPKD, of whom 302 (16.27%) progressed to dialysis. Multivariable Cox regression identified several significant risk factors, including age 66 years and older (hazard ratio HR 4.63, 95% CI 2.71‐7.92; P <.001), anemia (HR 4.33, 95% CI 3.25‐5.78; P <.001), congestive heart failure (HR 1.81, 95% CI 1.29‐2.54; P <.001), and acute kidney injury (HR 1.69, 95% CI 1.19‐2.41; P =.003). Among the ML models, the XGBoost model, using an optimized set of 27 features, demonstrated the highest predictive performance on the held-out temporal test set (accuracy 98.3%; area under the curve 0.955; F 1 -score 0.800; Brier score 0.022). The top predictors in the XGBoost model largely aligned with age, comorbidity burden, anemia, and cardiovascular disease markers. Medication use (eg, anticoagulants, loop diuretics, febuxostat) was also among the most influential predictors; however, medication-related predictors should be interpreted as proxies for disease complexity rather than direct risk modulators. Conclusions ML models can predict dialysis risk in patients with ADPKD using administrative data with temporal validation. This approach may support risk stratification by helping identify individuals at higher predicted risk who may warrant closer monitoring and further specialist evaluation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Chang et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69ba43cb4e9516ffd37a551a — DOI: https://doi.org/10.2196/80343
Cheng-Hao Chang
Mingchih Chen
Ming-Hsien Tsai
JMIR Medical Informatics
Building similarity graph...
Analyzing shared references across papers
Loading...