Leaf Area Index (LAI) is a key biophysical parameter for characterizing canopy structure, playing a critical role in precision agricultural monitoring and management. However, traditional optical remote sensing is often hindered by signal saturation effects in high-density canopies, and existing machine learning methods typically lack interpretability analysis of feature sensitivity across different phenological stages. To address these challenges, this study established a dataset containing 1296 ground samples based on two-year field experiments (2024–2025) with varying water and nitrogen gradients. Using an Unmanned Aerial Vehicle (UAV) platform equipped with multispectral, thermal infrared, and Light Detection and Ranging (LiDAR) sensors, data were acquired during three key growth stages (trumpet stage, tasseling stage, and grain filling stage). We developed a robust Stacking ensemble framework integrating heterogeneous base learners, including Ridge Regression (Ridge), Support Vector Regression (SVR), eXtreme Gradient Boosting (XGBoost), and Multi-Layer Perceptron (MLP). This ensemble model was systematically compared with five single regression models and a Transformer model. Rigorous ablation experiments were designed to quantify the contributions of different modalities, and the SHapley Additive exPlanations (SHAP) method was employed to decipher the spatiotemporal evolution of feature sensitivity. Results indicate that: (1) The proposed Stacking model exhibited superior generalization capabilities and cross-year stability across six growth stages over two years (R 2 = 0.808–0.849, RMSE = 0.129–0.251). (2) Ablation analysis confirmed that the four-modal combination of “spectral feature + texture feature + structure feature + thermal feature” achieved the highest accuracy. It significantly outperformed single-modal or dual-modal combinations, effectively overcoming the spectral saturation effect during reproductive growth stages. (3) SHAP analysis revealed a hierarchical decision-making mechanism across growth stages, where thermal and structural features serve as core drivers, while spectral and texture features act as fine-tuning factors.This study not only provides a high-precision and robust solution for crop monitoring but also theoretically elucidates the underlying mechanism by which multi-modal fusion overcomes optical limitations from a data science perspective, offering theoretical support for intelligent decision-making in precision agriculture. • Multimodal fusion overcomes spectral saturation in maize LAI prediction. • Integration of four complementary features—structural, textural, spectral, and thermal—enables holistic canopy characterization. • Stacking model outperformed benchmark algorithms. • Structural and thermal features consistently dominated across all stages. • SHAP analysis revealed a hierarchical decision-making mechanism across growth stages.
Wang et al. (Wed,) studied this question.