February 18, 2026Open Access

Development and internal validation of a machine learning–based prediction model for pulmonary hypertension in COPD

Key Points

Key points are not available for this paper at this time.

Abstract

Background: Chronic obstructive pulmonary disease (COPD) is frequently complicated by pulmonary hypertension (PH), which worsens prognosis, but early PH detection is limited by the invasiveness or suboptimal sensitivity of current diagnostic tools. Methods: In this retrospective study, we analyzed 523 hospitalized patients with COPD from Beijing Chaoyang Hospital. After standardized preprocessing and recursive feature elimination, 18 routinely available noninvasive clinical and physiological variables were retained as predictors. Eight machine-learning algorithms were trained to predict PH and compared using area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, F1 score, and decision-curve analysis; model interpretability was assessed with Shapley additive explanations (SHAP). Results: The CatBoost model showed the best discrimination (AUC 0.848; accuracy 0.830; sensitivity 0.758; specificity 0.866; F1 0.746). SHAP analysis identified right ventricular diameter, pulmonary artery diameter, arterial partial pressure of carbon dioxide, right atrial transverse diameter, and age as the most influential predictors. Conclusion: A CatBoost-based prediction model using readily obtainable noninvasive variables can estimate PH risk in COPD with good accuracy and provide transparent feature-level explanations, potentially facilitating earlier detection and risk-stratified management.

Development and internal validation of a machine learning–based prediction model for pulmonary hypertension in COPD

Key Points

Abstract

Cite This Study