As a crucial grassland management practice, mowing plays a key role in maintaining the stability, productivity, and economic value of grassland ecosystems. The development of large-scale monitoring techniques for detecting whether mowing has occurred is of significant scientific and practical importance for improving the understanding of grassland ecosystem response mechanisms and optimizing management strategies. This study focuses on the concentrated grassland area of the Xilingol League in Inner Mongolia, restricted to the SAR-covered western sub-region. All classification accuracies reported here are obtained under spatially random train/test splits and represent an upper bound; generalization to geographically disjoint blocks remains unverified. By utilizing Sentinel-1, Sentinel-2, and Landsat-8 remote sensing images during the mowing season (August to September 2023) along with field survey data, we first applied the random forest-SHAP algorithm to select the optimal features from 70 texture features and construct a multimodal remote sensing dataset. Subsequently, we proposed the MAD-Net (Multi-Modal Attention Fusion Network with Dynamic Weighting) model to fully exploit information related to mowing identification from both optical and SAR data and conducted comparative analyses with other models. The results indicate that the CNNLSTMAttention model, which integrates convolutional neural networks, long short-term memory networks, and convolutional block attention modules, performed best in terms of capturing spatiotemporal variations in time series NDVI data. The U-Net model achieved the highest performance on the optimized texture dataset, while the MAD-Net model, which consists of three subnetworks that target different feature data, reached an identification accuracy of 92. 59% in the SAR-covered western sub-region under a spatially random train/test split. This result represents an optimistic upper bound, as generalization to geographically independent blocks has not been evaluated. Ablation studies reveal that NDVI time series is the most informative single modality, while texture and SAR features provide complementary information; the proposed dynamic weighting module outperforms conventional fusion strategies. This study provides a new perspective for the large-scale binary classification of mown vs. non-mown grassland and effectively combines multimodal remote sensing data with deep learning models. Thus, this work not only offers a comparative basis for timely and effective identification of mowed grasslands but also provides insights for formulating optimized regional grassland management policies.
Yang et al. (Mon,) studied this question.