What question did this study set out to answer?

The aim is to improve classification accuracy for high-resolution remote sensing images while maintaining computational efficiency.

April 22, 2026Open Access

MS-DARNet: A Lightweight Multi-Scale Selective Dilated Attention Residual Network for Remote Sensing Scene Classification

Key Points

The aim is to improve classification accuracy for high-resolution remote sensing images while maintaining computational efficiency.
Develop a lightweight Multi-Scale Selective Dilated Attention Residual Network (MS-DARNet) for scene classification.
Incorporate a Multi-branch Dilated Feature Extraction (MDFE) module with varying dilation rates.
Introduce a Context-Position Aware Attention (CPAA) module for enhanced spatial feature retention.
Achieved classification accuracies of 97.78%, 94.53%, and 94.55% on AID, NWPU-RESISC45, and RSD-WHU46 datasets, respectively.
Maintained a low complexity with only 2.50 million parameters and 0.5940 GMACs.
Demonstrated optimal balance between lightweight architecture and classification performance.

Abstract

High-resolution remote sensing image (HRRSI) scene classification faces challenges such as significant target scale variations, complex background interference, and the difficult spatial parsing of dense objects (such as tightly packed buildings in dense residential areas or scattered aircraft on aprons), while existing models struggle to balance computational efficiency and classification accuracy. To address these issues, this paper proposes a lightweight Multi-Scale Selective Dilated Attention Residual Network (MS-DARNet). The model utilizes a Multi-branch Dilated Feature Extraction (MDFE) module, employing parallel convolutional branches with varying dilation rates to dynamically expand the receptive field and collaboratively extract multi-scale features without increasing parameter counts. Furthermore, a Context-Position Aware Attention (CPAA) module is introduced, combining a large kernel decomposition strategy to suppress irrelevant background noise with direction-aware feature aggregation to retain precise spatial coordinates for dense objects. Extensive experiments on the AID, NWPU-RESISC45, and RSD-WHU46 datasets show that MS-DARNet achieves superior classification accuracies of 97.78%, 94.53%, and 94.55%, respectively. Concurrently, it maintains a significantly low complexity of just 2.50 M parameters and 0.5940 GMACs. These findings demonstrate that MS-DARNet effectively achieves an optimal balance between lightweight architecture and exceptional classification performance for complex remote sensing scenes.

Mark Helpful

Bookmark

Relay

View Full Paper