Key points are not available for this paper at this time.
Soil δ13C is an integrative indicator of carbon cycling, vegetation composition, and land use dynamics. Despite increasing availability of high-resolution environmental datasets, predicting soil δ13C remains challenging due to collinear and scale-dependent biogeochemical processes, and few studies have systematically compared feature selection strategies or regression algorithms across spatial scales. This study introduces an innovative hierarchical framework for predicting the spatial distribution of soil δ13C across Brazil, systematically comparing feature selection strategies and machine learning algorithms across three nested datasets: Cerrado, extended Cerrado, and national scale. Predictors included climatic variables, topography, soil properties, and vegetation indices. Feature selection combined stepwise, recursive, and exhaustive searches, followed by variance inflation factor (VIF) filtering to reduce multicollinearity. Model benchmarking compared linear, kernel-based, and ensemble regressors under nested cross-validation, with performance assessed by coefficient of determination (R2), root mean squared error (RMSE), and mean absolute error (MAE). Results show that model performance declined with increasing spatial extent, with best VIF-constrained R2 decreasing from 0.77 (local) to 0.64 (regional) and 0.58 (national). Compact VIF-constrained subsets yielded similar accuracy to unconstrained sets, demonstrating that multicollinearity control improves parsimony without sacrificing predictive power. Ensemble regressors outperformed linear and kernel-based methods across all datasets. Feature importance shifted with spatial extent, with vegetation productivity and seasonal climate jointly structuring δ13C patterns rather than any single predictor dominating across scales. This framework advances δ13C isoscape modeling by combining predictive accuracy with interpretability, supporting applications in soil carbon monitoring, ecological research, and land-use planning.
Júnior et al. (Mon,) studied this question.