What question did this study set out to answer?

March 7, 2026Open Access

Ct-Derived Body Composition and Machine Learning for Prognostic Improvement in Colorectal Cancer

Key Points

The aim is to evaluate machine learning models for predicting overall survival in colorectal cancer using clinical and CT-derived body composition data.
Retrospective cohort analysis of 1553 colorectal cancer patients, focusing on 915 with quality data.
Machine learning models trained on 31 clinical variables and tested with 39 body composition features from CT imaging.
Principal Component Analysis (PCA) used for dimensionality reduction and feature selection via SHAP importance and recursive feature inclusion.
Logistic regression model achieved an AUC of 0.86 using clinical variables; 0.84 with CT data.
Boosted decision trees showed lower feature stability (JSD = 0.65) compared to logistic regression (JSD = 0.80).
Body composition features did not enhance AUC but revealed trade-offs between predictive accuracy and stability.

Abstract

Translational research in cancer theranostics aims to transform routinely acquired clinical and imaging data into robust and clinically actionable biomarkers. In colorectal cancer (CRC), computed tomography (CT) is widely used for staging and follow-up and represents a valuable source of quantitative body composition biomarkers. However, the integration of high-dimensional radiomics into predictive models faces challenges related to performance, interpretability, and feature stability. To evaluate the prognostic performance and stability of machine learning (ML) models for overall survival prediction in colorectal cancer using clinical data and CT-derived body composition features. A retrospective cohort of 1553 CRC patients was analyzed; 915 met data quality criteria and were included in the primary analysis, in which ML models were trained using 31 clinical and laboratory variables. A CT-imaging subset (n = 448) was subsequently evaluated to assess model robustness and to incorporate 39 body composition radiomics features derived from skeletal muscle, visceral adipose tissue, and subcutaneous adipose tissue. To address high dimensionality, a principal component analysis (PCA)-based reduction pipeline was implemented, with clinically relevant components selected using SHAP feature importance and recursive feature inclusion. At each step, feature ranking stability was quantified using the Jensen-Shannon Divergence (JSD) across 1200 iterations. Using clinical variables alone, logistic regression (LR) achieved an area under the ROC curve (AUC) of 0.86 in the full cohort and 0.84 in the CT subset. Prior to dimensionality reduction in the CT subset, LR slightly outperformed boosted decision trees (BDT) in discriminative power, while BDT exhibited lower feature selection stability (JSD = 0.65) compared to LR (JSD = 0.80). The inclusion of body composition features did not improve AUC; however, radiomics variables such as skeletal muscle predicted area, skeletal muscle median radiodensity, and visceral adipose tissue 90th percentile consistently ranked among the most relevant predictors. After PCA-based dimensionality reduction, LR maintained its discriminative performance (AUC = 0.84) with a moderate decrease in stability (JSD = 0.71), whereas BDT showed a marked improvement in stability (JSD = 0.81) despite a minor reduction in AUC. These results indicate a clear trade-off between discrimination and robustness, with BDT demonstrating greater gains in stability in lower-dimensional feature spaces. Machine learning models demonstrated consistent performance in predicting overall survival in colorectal cancer using routinely available clinical data. The inclusion of CT-derived body composition features did not improve discriminative performance but revealed a clear trade-off between predictive accuracy and feature stability. Dimensionality reduction through PCA preserves model performance and may play an important role in improving model interpretability and stability, which are essential for the translational application of theranostic models.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Pedro Henrique Alves

Vinicius Barbosa Bassete

Jun Takahashi

Journals

Hematology Transfusion and Cell Therapy

Actions

Institutions

Universidade Estadual de Campinas (UNICAMP)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Ct-Derived Body Composition and Machine Learning for Prognostic Improvement in Colorectal Cancer

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study