Translational research in cancer theranostics aims to transform routinely acquired clinical and imaging data into robust and clinically actionable biomarkers. In colorectal cancer (CRC), computed tomography (CT) is widely used for staging and follow-up and represents a valuable source of quantitative body composition biomarkers. However, the integration of high-dimensional radiomics into predictive models faces challenges related to performance, interpretability, and feature stability. To evaluate the prognostic performance and stability of machine learning (ML) models for overall survival prediction in colorectal cancer using clinical data and CT-derived body composition features. A retrospective cohort of 1553 CRC patients was analyzed; 915 met data quality criteria and were included in the primary analysis, in which ML models were trained using 31 clinical and laboratory variables. A CT-imaging subset (n = 448) was subsequently evaluated to assess model robustness and to incorporate 39 body composition radiomics features derived from skeletal muscle, visceral adipose tissue, and subcutaneous adipose tissue. To address high dimensionality, a principal component analysis (PCA)-based reduction pipeline was implemented, with clinically relevant components selected using SHAP feature importance and recursive feature inclusion. At each step, feature ranking stability was quantified using the Jensen-Shannon Divergence (JSD) across 1200 iterations. Using clinical variables alone, logistic regression (LR) achieved an area under the ROC curve (AUC) of 0.86 in the full cohort and 0.84 in the CT subset. Prior to dimensionality reduction in the CT subset, LR slightly outperformed boosted decision trees (BDT) in discriminative power, while BDT exhibited lower feature selection stability (JSD = 0.65) compared to LR (JSD = 0.80). The inclusion of body composition features did not improve AUC; however, radiomics variables such as skeletal muscle predicted area, skeletal muscle median radiodensity, and visceral adipose tissue 90th percentile consistently ranked among the most relevant predictors. After PCA-based dimensionality reduction, LR maintained its discriminative performance (AUC = 0.84) with a moderate decrease in stability (JSD = 0.71), whereas BDT showed a marked improvement in stability (JSD = 0.81) despite a minor reduction in AUC. These results indicate a clear trade-off between discrimination and robustness, with BDT demonstrating greater gains in stability in lower-dimensional feature spaces. Machine learning models demonstrated consistent performance in predicting overall survival in colorectal cancer using routinely available clinical data. The inclusion of CT-derived body composition features did not improve discriminative performance but revealed a clear trade-off between predictive accuracy and feature stability. Dimensionality reduction through PCA preserves model performance and may play an important role in improving model interpretability and stability, which are essential for the translational application of theranostic models.
Building similarity graph...
Analyzing shared references across papers
Loading...
Pedro Henrique Alves
Vinicius Barbosa Bassete
Jun Takahashi
Hematology Transfusion and Cell Therapy
Universidade Estadual de Campinas (UNICAMP)
Building similarity graph...
Analyzing shared references across papers
Loading...
Alves et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69abc1535af8044f7a4e9dca — DOI: https://doi.org/10.1016/j.htct.2026.106275