What question did this study set out to answer?

The aim is to develop a robust model for few-shot class-incremental learning in remote sensing while mitigating knowledge forgetting and performance degradation.

April 15, 2026Open Access

Dynamic Expansion Mixture-of-Experts with Pre-Trained Vision Transformer for Few-Shot Class-Incremental Remote Sensing Scene Classification

Key Points

The aim is to develop a robust model for few-shot class-incremental learning in remote sensing while mitigating knowledge forgetting and performance degradation.
Developed the DEM-ViT framework incorporating an Adapter-Based Mixture-of-Experts module.
Implemented a Dynamic Expert Expansion strategy to gradually increase model capacity during incremental training.
Introduced Semantic-Guided Feature Alignment to enhance features with textual information.
The proposed framework significantly outperforms existing methods on remote sensing datasets.
Achieved improved knowledge retention while learning new classes with few samples.
Demonstrated effectiveness in reducing overfitting under limited data conditions.

Abstract

Few-Shot Class-Incremental Learning (FSCIL) aims to sequentially learn new classes from very few labelled samples while preventing the forgetting of previously acquired knowledge, which has important practical value for remote sensing scene classification (RSSC). Recent studies have shown that applying a Vision Transformer (ViT) pre-trained on natural image datasets to FSCIL tasks can achieve significantly superior performance. Nevertheless, a substantial domain distribution gap exists between natural images and remote sensing images, which leads to severe performance degradation when such models are directly transferred to RSSC. To address the domain gap alongside FSCIL’s inherent stability–plasticity dilemma and overfitting under data scarcity, we propose a Dynamic Expansion Mixture-of-Experts with Pre-trained Vision Transformer (DEM-ViT) framework. Specifically, to alleviate the domain discrepancy, DEM-ViT incorporates an Adapter-Based Mixture-of-Experts (ABMoE) module, which captures the diverse visual patterns of remote sensing scenes through feature reconstruction in the representation space and collaborative learning among multiple experts. Furthermore, to address the stability–plasticity dilemma in FSCIL, we propose a Dynamic Expert Expansion (DEE) strategy, which progressively expands the model capacity along the incremental sessions. DEE provides sufficient space for learning new knowledge while mitigating the forgetting of previous knowledge. In addition, we propose a Semantic-Guided Feature Alignment (SGFA) method to reduce the risk of overfitting under data-scarce conditions. SGFA leverages textual information to construct robust text prototypes and uses them to calibrate the visual feature space. Extensive experiments across three benchmarks indicate that our framework exhibits highly competitive performance compared with state-of-the-art methods.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Wu et al. (Sun,) studied this question.

www.synapsesocial.com/papers/69df2c2fe4eeef8a2a6b134f — DOI: https://doi.org/10.3390/rs18081145

Authors

Yunhao Wu

Xiang Li

Jianlin Zhang

Journals

Remote Sensing

Actions

Institutions

University of Chinese Academy of Sciences

Institute of Optics and Electronics, Chinese Academy of Sciences

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Dynamic Expansion Mixture-of-Experts with Pre-Trained Vision Transformer for Few-Shot Class-Incremental Remote Sensing Scene Classification

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion