Clinical cancer diagnostics require ML that maintains accuracy under biological variability with interpretable feature attribution─a particular challenge for serum-based approaches where healthy and diseased samples share >95% chemical composition. Continuous wavelet transformation (CWT) combined with convolutional neural networks (CNN) has demonstrated robust classification of Raman spectra for materials under synthetic noise conditions, but whether this approach can handle biological variability in clinical samples, and which spectral features drive its predictions, has not been explored. Here, we demonstrate application of CWT-CNN deep learning to clinical disease diagnosis, analyzing spontaneous Raman spectra from a retrospective cohort of 213 patient serum samples (106 lung cancer, 107 controls) collected over 3 years. We extend the established CWT-CNN framework with interpretability analysis using Gradient-weighted Class Activation Mapping (Grad-CAM) and inverse-CWT reconstruction. Using only 5 μL of serum and 10 min of acquisition time per patient, our approach achieved 90.5% accuracy in an independent validation cohort (19/21 correct diagnoses, 91.7% sensitivity, 88.9% specificity) using strict patient-wise data splitting. Interpretability analysis revealed that classification decisions focus on Raman shifts at 1004 cm–1 (phenylalanine), 1129 cm–1 (lipid trans-conformation), 1458 cm–1 (nucleotides), and 1560 cm–1 (tryptophan). These spectral features correspond to molecules with established roles in cancer metabolism. This demonstration that CWT-CNN maintains high accuracy under biological variability and leakage-safe, patient-level validation, combined with biochemically meaningful feature attribution, establishes a data-first approach where comprehensive spectral analysis enables both diagnostic accuracy and identification of disease-relevant molecular features.
Zhang et al. (Mon,) studied this question.