Karst aquifers present unique challenges for groundwater level prediction due to their dual-porosity structures and highly nonlinear hydrological responses. This study systematically evaluates nine machine learning and deep learning models (RF, XGBoost, LSTM, CNN, Transformer, N-BEATS, CNN-LSTM, Seq2Seq-LSTM, and Attention-Seq2Seq-LSTM) for rainfall-driven groundwater level forecasting in the Maocun subterranean river catchment, Guilin, Guangxi, China. Two years of hourly high-frequency data from three monitoring sites representing distinct hydrogeological zones (recharge, flow, and discharge) were employed within a multidimensional evaluation framework integrating single-step accuracy, multi-step stability, and computational efficiency. Results indicate that the Transformer achieved the highest single-step prediction accuracy, attaining the lowest RMSE (0.130–0.606 m) and highest R2 (0.813–0.965) across all three sites. CNN-LSTM offered the best balance between predictive performance and computational cost, requiring an average training time of only 27.97 s and 28.0 convergence epochs. N-BEATS demonstrated superior long-term stability in 12-steps-ahead forecasting, achieving R2 = 0.914 at ZK1, outperforming all other architectures. More fundamentally, hydrogeological complexity exerted a dominant control on predictive skill that systematically outweighed differences arising from model architecture. All models yielded R2 below 0.813 at the geologically complex ZK2 site, whereas R2 exceeded 0.950 across all models at ZK1, indicating that aquifer complexity, rather than algorithm selection, constitutes the primary constraint on prediction feasibility. This study presents the first application of N-BEATS to karst groundwater level forecasting and proposes a replicable multidimensional evaluation framework, providing a scientific reference for intelligent modelling of complex karst systems.
Zhu et al. (Tue,) studied this question.