Dear Editor, We read with great interest the article by Lan et al, entitled “Deep learning enhanced MRI radiomics in predicting pathologic response of head and neck squamous carcinoma to neoadjuvant chemoimmunotherapy: a retrospective analysis”1. This study integrates deep learning (DL), an advanced computational approach, with traditional radiomics and clinicopathological features to construct a multimodal model aimed at accurately predicting pathological complete response (pCR) to neoadjuvant chemoimmunotherapy in patients with head and neck squamous cell carcinoma. As a multicenter retrospective investigation with a systematic design and comprehensive validation, it presents a promising approach for applying artificial intelligence to precision oncology. However, while the model shows promising results, its practical implementation and transition from laboratory performance to clinical trustworthiness present challenges that require further discussion. First, heterogeneity in sample distribution. As shown in Table 1, significant differences exist between the training set (primarily from SYSMH center) and the external validation set (from SYSUCC and FFPH centers) across multiple key baseline characteristics. Notably, the gender ratio (63.4% male in the training set vs. 17.7% in the validation set) and T/N stage distribution differ substantially. Although the model maintained an AUC of 0.740 on this distinct validation set, demonstrating some robustness, it raises a critical question: Is the model capturing universally applicable imaging-biological patterns inherently related to pCR, or is it primarily adapting to the specific patient subgroup and imaging protocols in the training set? When applied to a new population with an even more divergent distribution, performance degradation could be more pronounced. Second, non-uniform data quality and selection bias. Retrospective studies rely on historically archived images and medical records. The paper mentions excluding 190 cases due to incomplete MRI data and 27 cases due to poor image quality. This exclusion process produces a final cohort of 282 patients with relatively “qualified” and “standardized” images. However, in real-world clinical workflows, variable image quality, differing patient compliance, and minor adjustments in scanning protocols are common, resulting in “imperfect” data2. A model optimized on curated retrospective data may exhibit reduced robustness when applied prospectively to unscreened, heterogeneous data. Can the model handle suboptimal or unstandardized MRI? Prospective validation is necessary to answer this question. Third, clinical translation beyond AUC. The integrated model combining clinical, traditional radiomic, and DL features achieved the highest AUC across the training, testing, and external validation sets (0.781, 0.759, and 0.740, respectively), which is encouraging. However, AUC alone may obscure clinically relevant information3. For instance, in the testing set, the integrated model demonstrated high sensitivity (0.857) but only moderate specificity (0.714), with a positive predictive value of 0.500. While highly sensitive in identifying potential responders, the model also produces a substantial number of false positives. Therefore, additional metrics such as the F1 score, threshold analysis, and precision should be considered for clinical interpretation. In summary, this study represents a rigorous and insightful exploration, demonstrating the feasibility of integrating DL into tumor imaging workflows and showing that multimodal fusion can enhance predictive performance. Future work should focus on integrating explainable AI and multi-omics data (e.g., genomics and metabonomics) to enhance biological interpretability of DL features, rigorously evaluating model performance and calibration across diverse subgroups in large prospective multicenter cohorts with adaptive decision rules and exploring associations between predictions and long-term outcomes to evolve the tool from a treatment response predictor into a prognostic assessment system. The researchers independently conducted all aspects of this study, including design, data analysis, and conclusion derivation, without using any AI tools. In alignment with the TITAN guidelines (Agha et al, 2025), we ensured transparency in reporting this AI-related research4.
Building similarity graph...
Analyzing shared references across papers
Loading...
Chun-Yan Zhao
Jing Luo
Jin-Liang Kong
International Journal of Surgery
Guangxi Medical University
First Affiliated Hospital of GuangXi Medical University
Online Technologies (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhao et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69a75bc4c6e9836116a23b53 — DOI: https://doi.org/10.1097/js9.0000000000004865
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: