Accurate prediction of progression to Alzheimer’s disease (AD) is crucial for early intervention and personalized patient management. In this study, we developed a robust, data-driven survival analysis pipeline to model time-to-progression from cognitively normal (CN) and mild cognitive impairment (MCI) at baseline to AD, integrating cognitive, clinical, MRI and PET neuroimaging biomarkers, and biospecimen features from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. The ADNI cohort can be regarded as a multi-center platform for multimodal data integration that jointly captures cognitive performance, MRI/PET imaging-sensor biomarkers, and biofluid biosensing assays within a unified prognostic framework. Accordingly, our pipeline is designed to be robust to cross-site and cross-instrument variability through harmonized preprocessing and quality-check aware integration of heterogeneous multimodal data. Indeed, we employed eXtreme Gradient Boosting (XGBoost) for predicting survival data, which allows for the native handling of missing values that are frequently observed in real-world clinical datasets. Our results confirm that strong predictive performance can be achieved using a minimal set of features, obtaining a concordance index (C-index) of 0.92 using 13 features and 0.90 using only 4 features. These findings underscore the importance of multi-domain feature integration, transparent feature selection, and the inclusion of underexplored biomarkers such as lipid metabolites for prognostic modeling.
Palma et al. (Sat,) studied this question.