What question did this study set out to answer?

Evaluate the accuracy and interpretability of CWT-CNN for diagnosing lung cancer using serum Raman spectra.

April 22, 2026

Interpretable Wavelet-CNN for Accurate Serum Raman Lung Cancer Diagnosis under Leakage-Safe, Patient-Level Splits

Key Points

Evaluate the accuracy and interpretability of CWT-CNN for diagnosing lung cancer using serum Raman spectra.
Applied continuous wavelet transformation with convolutional neural networks to analyze Raman spectra.
Utilized Gradient-weighted Class Activation Mapping for interpretability in feature attribution.
Conducted tests on 213 serum samples across a 3-year retrospective cohort.
Achieved 90.5% overall diagnostic accuracy in the validation cohort.
Demonstrated 91.7% sensitivity and 88.9% specificity in diagnosing lung cancer.
Identified key Raman shifts linked to cancer metabolism as critical features for classification.

Abstract

Clinical cancer diagnostics require ML that maintains accuracy under biological variability with interpretable feature attribution─a particular challenge for serum-based approaches where healthy and diseased samples share >95% chemical composition. Continuous wavelet transformation (CWT) combined with convolutional neural networks (CNN) has demonstrated robust classification of Raman spectra for materials under synthetic noise conditions, but whether this approach can handle biological variability in clinical samples, and which spectral features drive its predictions, has not been explored. Here, we demonstrate application of CWT-CNN deep learning to clinical disease diagnosis, analyzing spontaneous Raman spectra from a retrospective cohort of 213 patient serum samples (106 lung cancer, 107 controls) collected over 3 years. We extend the established CWT-CNN framework with interpretability analysis using Gradient-weighted Class Activation Mapping (Grad-CAM) and inverse-CWT reconstruction. Using only 5 μL of serum and 10 min of acquisition time per patient, our approach achieved 90.5% accuracy in an independent validation cohort (19/21 correct diagnoses, 91.7% sensitivity, 88.9% specificity) using strict patient-wise data splitting. Interpretability analysis revealed that classification decisions focus on Raman shifts at 1004 cm–1 (phenylalanine), 1129 cm–1 (lipid trans-conformation), 1458 cm–1 (nucleotides), and 1560 cm–1 (tryptophan). These spectral features correspond to molecules with established roles in cancer metabolism. This demonstration that CWT-CNN maintains high accuracy under biological variability and leakage-safe, patient-level validation, combined with biochemically meaningful feature attribution, establishes a data-first approach where comprehensive spectral analysis enables both diagnostic accuracy and identification of disease-relevant molecular features.

Bookmark

Interpretable Wavelet-CNN for Accurate Serum Raman Lung Cancer Diagnosis under Leakage-Safe, Patient-Level Splits

Key Points

Abstract

Cite This Study