High-resolution remote sensing (RS) images exhibit complex backgrounds, large intra-class variability, and low inter-class differences, posing substantial challenges for semantic segmentation. Although existing class-level contextual modeling methods partially alleviate these issues, they often overlook the importance of accurate and discriminative class representations and fail to effectively handle hard samples during training. To address these limitations, we propose CRECA-Net, a class representation-enhanced class-aware network designed from two complementary perspectives: class prototype refinement and difficulty-aware learning. Specifically, we introduce a class prototype refinement (CPR) module that improves class representations through pixel selection, confidence-aware contribution weighting, and an inter-class prototype separation loss, yielding more reliable and discriminative class centers. In addition, class-level context aggregation (CLCA) modules capture pixel-to-class prototype correlations via cross-attention to inject class-aware semantics into decoder features, thereby reducing interference from cluttered backgrounds and visually similar categories. Furthermore, a difficulty-aware (DA) loss dynamically estimates pixel-wise difficulty and redistributes the loss weights within each image, gradually shifting the learning focus from easy to hard samples while maintaining training stability. Extensive experiments on two benchmark RS segmentation datasets demonstrate that CRECA-Net consistently outperforms state-of-the-art methods across multiple evaluation metrics.
Liu et al. (Sat,) studied this question.