OBJECTIVE: To develop an architecture-agnostic framework that estimates, calibrates, and leverages total uncertainty (aleatoric + epistemic) in pre-trained, deep-learning denoising models for low-dose computed tomography (CT). METHODS: Aleatoric and epistemic uncertainties were estimated using physics-based inference-time augmentation and training-free, post-hoc Monte Carlo dropout, respectively, followed by non-parametric re-calibration for improved uncertainty calibration. To leverage uncertainty, we explored adaptive local fusion (ALF) guided by local mean-to-uncertainty ratio. For proof-of-concept, this framework was assessed using pre-trained U-net and ResNet-based models across datasets varying in CT tasks, radiation dose, and lesion characteristics. Uncertainty estimation and calibration were assessed in cadaver scans, using normalized-root-mean-square-error (NRMSE) and normalized-calibration-error (NCE), respectively. ALF was evaluated with chest and liver exams, using noise, structural similarity index (SSIM), and lesion detectability. Lesion detectability was quantified using clinically validated deep-learning model observer, with Wilcoxon signed-rank test to assess significance. RESULTS: This framework provided accurate uncertainty quantification and calibration: NRMSE range 1.2%, 2.4%, NCE 0.9%, 2.2%. Compared to original pre-trained models, ALF yielded comparable or lower noise, improved lesion structural fidelity and detectability (p<0.05): For lung nodules - noise reduction up to 69.7%, SSIM range (ALF vs pre-trained) 0.92, 0.96 vs 0.78, 0.87, detectability improvement up to 12.9%; for liver metastases - noise reduction up to 35.0%, SSIM range (ALF vs pre-trained) 0.82, 0.99 vs 0.80, 0.98, detectability improvement up to 13.2%. CONCLUSION: Our framework effectively benchmarked and utilized total uncertainty to enhance diagnostic image quality with pre-trained CT denoising models. SIGNIFICANCE: This framework can facilitate performance monitoring, deployment optimization, and trustworthiness establishment.
Gong et al. (Thu,) studied this question.