Despite ongoing theoretical research on cross-validation (CV), many theoretical questions about CV remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds in k-fold cross-validation. Our results consist of a novel decomposition of the mean-squared error of cross-validation for risk estimation, which explicitly captures the correlations of error estimates across overlapping folds and includes a novel algorithmic stability notion, squared loss stability, that is considerably weaker than the typically required hypothesis stability in other comparable works. Furthermore, we prove: 1. For every learning algorithm that minimizes empirical error, a minimax lower bound on the mean-squared error of k-fold CV estimating the population risk LD: \ ₊ ₍\; ₃\; E\![ (L₂ₕ^ (k) - L₃) ^2 \;=\; Ω\! (k/n), \] where n is the sample size and k the number of folds. This shows that even under idealized conditions, for large values of k, CV cannot attain the optimum of order 1/n achievable by a validation set of size n, reflecting an inherent penalty caused by dependence between folds. 2. Complementing this, we exhibit learning rules for which \ ₃\; E\![ (L₂ₕ^ (k) - L₃) ^2 \;=\; Ω (k/n), \] matching (up to constants) the accuracy of a hold-out estimator of a single fold of size n/k. Together these results delineate the fundamental trade-off in resampling-based risk estimation: CV cannot fully exploit all n samples for unbiased risk evaluation, and its minimax performance is pinned between the k/n and k/n regimes.
Building similarity graph...
Analyzing shared references across papers
Loading...
Nachum et al. (Wed,) studied this question.
www.synapsesocial.com/papers/690fdcdaf60c54d04ea3821b — DOI: https://doi.org/10.48550/arxiv.2511.03554
Ido Nachum
Rüdiger Urbanke
Thomas Weinberger
Building similarity graph...
Analyzing shared references across papers
Loading...