Los puntos clave no están disponibles para este artículo en este momento.
A bstract Recent advances in vision-language models (VLMs) have demonstrated strong multimodal capabilities for medical image analysis. However, their confidence in diagnostic predictions is often unclear, limiting adoption in clinical settings. We introduce CALM-VLM ( CAL ibration M echanism for Vision-Language Models), which integrates confidence calibration and selective prediction into a generative 3D VLM. To create CALM-VLM, we fine-tuned the Med3DVLM architecture for Alzheimer’s disease (AD) and stroke classification as initial test cases. To improve reliability, we incorporated temperature scaling on the VLM’s generative outputs. The calibrated model then selectively abstained from predictions when uncertain; this also improved its diagnostic accuracy. Experiments across multi-site MRI datasets, from 10 countries worldwide, show that CALM-VLM improved confidence relative to uncalibrated VLMs. Coverage-adjusted test receiver-operator characteristic curve-area under the curve (ROC-AUC) increased by 5% to 13% for both diagnostic tasks across independent test sets. Our calibrated VLM achieved a test ROC-AUC of 0.951 for AD classification and 0.905 for stroke classification. These findings highlight the importance of calibrated, uncertainty-aware VLMs for trustworthy neuroimaging AI.
Dhinagar et al. (Tue,) studied this question.