To the Editor: Lung transplantation (LTx) is a lifesaving procedure for end-stage pneumoconiosis,1 a prevalent occupational disease in China2 that poses unique surgical challenges, including dense pleural adhesions and a high risk of intraoperative hemorrhage. Despite the growing procedural volume, robust prognostic tools specifically tailored to this high-risk population remain scarce. To assist clinicians in early postoperative risk stratification and guiding intensive care unit (ICU) management, we aimed to develop a transparent and clinically applicable risk prediction tool. To address the transparency gap of existing models, we developed and validated an interpretable machine learning framework for predicting 30-day postoperative mortality in lung transplant recipients with pneumoconiosis, incorporating SHapley Additive exPlanations (SHAP) to quantify individualized risk contributions and support clinical decision-making.3 This retrospective study analyzed clinical data from 584 recipients recorded in the China Lung Transplantation Registry (CLuTR, https://clutr.cotr.cn) between 2018 and 2025. Perioperative candidate predictors comprised 36 routinely collected variables, including preoperative recipient characteristics, donor-related factors, and intraoperative parameters extracted from the China Lung Transplantation Registry. After excluding patients with more than 20% missing data across candidate variables, Little’s test confirmed a missing-completely-at-random mechanism (P >0.05) in each set, and missing values were subsequently imputed separately within the training, validation, and temporal test sets using k-nearest neighbors (KNN). To rigorously assess generalizability, participants were temporally stratified into a derivation cohort (2020–2025), which was randomly split into training (n = 349) and internal validation (n = 149) sets, and an independent temporal test cohort (2018–2019, n = 86) Supplementary Figure 1, https://links.lww.com/CM9/C842. Ten machine learning algorithms were trained and optimized using repeated 10-fold cross-validation to benchmark their performance against standard logistic regression (LR) Supplementary Tables 1 and 2, https://links.lww.com/CM9/C842. Model interpretability was achieved using SHAP to quantify individual feature contributions. The study was conducted in accordance with the Declaration of Helsinki and was approved by the Ethics Committee of Wuxi People’s Hospital (No. 2024-KY24164), with informed consent waived. Detailed methodological protocols are provided in the Supplementary Methods, https://links.lww.com/CM9/C842. Demographic characteristics were well balanced across the training (n = 349), internal validation (n = 149), and independent temporal test (n = 86) cohorts, with observed 30-day mortality rates of 12.0%, 11.5%, and 8.0%, respectively Supplementary Table 3, https://links.lww.com/CM9/C842. Feature selection was conducted through a rigorous two-step process: least absolute shrinkage and selection operator (LASSO) regularization was first applied to screen candidate variables and mitigate multicollinearity Supplementary Figures 2A and B, https://links.lww.com/CM9/C842; in parallel, univariable (P 0.90). However, this apparent superiority masked marked overfitting, as evidenced by AUCs decreasing to 0.632 and 0.611 in the internal validation set and to 0.732 and 0.635 in the temporal test set for LightGBM and CatBoost, respectively, with corresponding F1 scores falling from 0.875 and 0.845 in the training set to 0.400 in both the validation and temporal test sets Supplementary Table 5, https://links.lww.com/CM9/C842. In contrast, LR exhibited superior generalizability. It maintained robust discrimination in both the internal validation and temporal test sets (AUC = 0.787 and 0.805, respectively, with stable F1 scores of 0.70–0.73), with calibration curves indicating close agreement between predicted probabilities and observed outcomes Figure 1 and Supplementary Table 5, https://links.lww.com/CM9/C842. The DeLong test provided statistical confirmation that the AUC of LR was noninferior to, and in several instances significantly higher than, those of the more complex algorithms in the external test cohort Supplementary Table 4, https://links.lww.com/CM9/C842.Figure 1: Performance evaluation of the 10 machine learning classification models for predicting 30-day mortality. (A) ROC curves for the 10 machine learning classifiers in the training set. (B) ROC curves for the 10 classifiers evaluated using the internal validation set. (C) ROC curves for the 10 classifiers evaluated using the temporal test set. (D) Calibration curve of the LR model using the training set. (E) Calibration curve of the LR model using the internal validation set. (F) Calibration curve of the LR model using the temporal test set. (G) DCA curve of the LR using the training set. (H) DCA curve of the LR using the internal validation set. (I) DCA curve of the LR using the temporal test set. DCA: Decision curve analysis; GBM: Gradient boosting machine; KNN: k-nearest neighbors; LR: Logistic regression; ROC: Receiver-operating characteristic; SVM: Support vector machine.To address the interpretability barrier, SHAP analysis was applied to the final model to translate algorithmic outputs into clinically actionable insights. The SHAP summary plot quantitatively ranked feature importance, identifying intraoperative blood loss as the strongest predictor of 30-day mortality,4 followed by CO and donor smoking history Supplementary Figure 3, https://links.lww.com/CM9/C842.5 In contrast to linear coefficients, SHAP dependence plots revealed critical nonlinear thresholds. For example, mortality risk remained relatively stable until intraoperative blood loss exceeded 3000 mL, beyond which the risk increased sharply. Similarly, a protective range was observed for CO above 3.6 L/min; below this threshold, the risk contribution increased steeply Supplementary Figure 4, https://links.lww.com/CM9/C842. Moreover, the stratified SHAP analysis revealed clinically vital interaction effects that may be overlooked by global models. When stratified by cardiac function, the impact of hemorrhage was heterogeneous: in patients with compromised cardiac reserve (CO <3.6 L/min), the risk slope associated with blood loss was significantly steeper compared to those with preserved cardiac function, suggesting a reduced hemodynamic tolerance for volume depletion in this subgroup. Furthermore, the CO-stratified analysis indicated that intraoperative blood loss was the most significant predictor Supplementary Figure 5, https://links.lww.com/CM9/C842. To facilitate immediate clinical translation, the LR model was integrated into an open-access web calculator. This tool requires only five core variables to provide real-time, individualized risk stratification and visual explanations, bridging the gap between complex statistical derivation and bedside decision-making Supplementary Figure 6, https://links.lww.com/CM9/C842; https://doctorlau.shinyapps.io/pneu-LR-model-risk/. This study provides several clinically significant insights. First, we demonstrate that a parsimonious model, constructed using only five routinely available variables, can effectively predict 30-day mortality in patients with pneumoconiosis, offering robust, validated results. Second, by integrating LR with SHAP, we address the “black-box” issue commonly associated with conventional machine learning, providing transparent, patient-specific explanations that quantify the impact of intraoperative blood loss, CO, intraoperative hypotension, type of LTx, and donor smoking history on early postoperative risk. Third, SHAP-derived thresholds and interaction patterns highlight the importance of intraoperative hemorrhage control and hemodynamic optimization as key modifiable determinants of outcomes. Collectively, these findings advocate for the integration of this interpretable risk model and its web-based calculator into immediate postoperative decision-support processes, specifically to identify high-risk patients requiring enhanced surveillance and hemodynamic optimization in cases related to pneumoconiosis. This study is subject to several limitations. First, to ensure the model’s applicability immediately upon ICU admission, we restricted the feature set to preoperative and intraoperative variables. Consequently, the model does not account for postoperative physiological evolution or new-onset complications (e.g., the grading of primary graft dysfunction at 24–72 hours, or subsequent infections). Although this design enables rapid “zero-hour” risk stratification, it renders the prediction static. Future iterations could incorporate these time-dependent variables to create a dynamic model that updates risk assessments throughout the ICU stay. Second, the retrospective nature of the study and its reliance on registry data may introduce selection bias, residual confounding, and information bias, despite rigorous efforts to address missing data. Moreover, although SHAP values substantially improved model interpretability and provided clinically intuitive thresholds, the findings require validation in prospective, preferably multicenter, studies. In conclusion, we have developed and temporally validated an interpretable LR model comprising five variables, which robustly predicts 30-day postoperative mortality in patients with pneumoconiosis undergoing LTx. The model’s implementation as a lightweight web-based calculator facilitates bedside application, supporting individualized postoperative risk assessment, ICU resource allocation, and targeted optimization of modifiable factors. These findings indicate that explainable machine learning can be effectively integrated into routine care for this high-risk population. Funding This study was supported by the Noncommunicable Chronic Diseases-National Science and Technology Major Project (No. 2023ZD0505900), and the National Key Research and Development Program of China (No. 2023YFC2507100). Conflicts of interest None.
Liu et al. (Fri,) studied this question.