• No single scoring rule best predicts Lockean accuracy across all belief thresholds. • The Spherical score is optimal for lower thresholds; the Power3 rule is best for higher thresholds. • The widely used Brier and log scores rarely perform best for predicting categorical accuracy. • Most scoring rules work by rewarding calibration and sharpness, but some track additional components. • Empirical methods can complement a priori arguments in evaluating scoring rules. The debate about which scoring rule best measures the accuracy of our credences has largely been conducted on an a priori basis. We pursue an empirical approach, asking which rule best predicts a practical, decision-relevant criterion: Lockean accuracy, the ability to make correct categorical judgments based on a threshold of belief. Analyzing a large dataset of probability judgments, we compare the most widely used scoring rules (Brier, logarithmic, spherical, absolute error, and power rules) and find that, among them, there is no single best one. Instead, the optimal choice is context dependent: the Spherical score is the best predictor for lower belief thresholds, while the Power 3 rule is best at higher thresholds. In particular, the widely used Brier and log scores are rarely optimal for this task. A mediation analysis reveals that while much of a rule’s success is explained by its ability to reward calibration and sharpness, the Spherical and Brier rules retain significant predictive power independently of these standard virtues.
Building similarity graph...
Analyzing shared references across papers
Loading...
Igor Douven
Raja Marjieh
International Journal of Approximate Reasoning
Columbia University
Centre National de la Recherche Scientifique
Sorbonne Université
Building similarity graph...
Analyzing shared references across papers
Loading...
Douven et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69a75e8fc6e9836116a29463 — DOI: https://doi.org/10.1016/j.ijar.2026.109636