Bayesian machine learning models-especially Bayesian neural networks (BNNs)-offer powerful black-box approaches for prediction and uncertainty quantification. However, these models frequently exhibit inconsistent prediction quality across input regions, and conventional global metrics (e.g., the mean squared error (MSE)) are inadequate for capturing such local discrepancies. To overcome this limitation, we introduce a novel kernel-based framework for local calibration testing that assesses how well predicted distributions reflect both the function to be learned and inherent uncertainties. In our approach, spherical input-space kernels are used to define relevant subsets in the neighborhood of a point to be tested. This enables the online assessment of these localized regions using calibration metrics or statistical tests. By aggregating results across multiple kernel widths, our method yields both robust binary decisions and a continuous analysis over arbitrary inputs. Numerical experiments on single- and multi-dimensional regression tasks demonstrate the efficiency and scalability of our approach, underscoring its potential for real-time and large-scale applications.
Walker et al. (Wed,) studied this question.