What question did this study set out to answer?

The research aims to enhance multi-label remote sensing scene classification by enabling autonomous label discovery without external prompts.

February 8, 2026Open Access

From Prompts to Self-Prompts: Parameter-Efficient Multi-Label Remote Sensing via Mask-Guided Classification

Key Points

The research aims to enhance multi-label remote sensing scene classification by enabling autonomous label discovery without external prompts.
Introduced SAFI-XRS, a parameter-efficient framework using less than 2% of a 332M-parameter segmenter.
Implemented a Semantic Query Generator (SQR) for creating class-aligned queries from images.
Developed a Mask-Guided Classifier (MGC) that consolidates spatial evidence into label confidences.
Conducted experiments on UCM-ML, DFC15-ML, and AID-ML datasets.
SAFI-XRS outperformed text-prompted foundation segmenters by +3.9/+3.8 mean Average Precision (mAP) on balanced datasets.
Achieved 6.8× parameter efficiency compared to traditional expert classifiers.

Abstract

Multi-label remote sensing scene classification (MLRSSC) requires autonomous discovery of all relevant land-cover categories without human guidance. Conventional expert classifiers return only label vectors without spatial evidence, while foundation segmenters (e.g., SAM, RemoteSAM) remain passively dependent on external prompts—misaligned with autonomous interpretation. We introduce SAFI-XRS, a parameter-efficient self-prompted framework that transforms passive prompting into active scene parsing. By training only <2% of a 332M-parameter segmenter (∼2.4M parameters), SAFI-XRS generates class-aligned queries from images via a Semantic Query Generator (SQR), replacing external prompts with self-generated conditioning. A Mask-Guided Classifier (MGC) aggregates spatial evidence into label confidences, enabling mask-based explainability. Experiments on UCM-ML, DFC15-ML, and AID-ML show SAFI-XRS surpasses text-prompted foundation segmenters (+3.9/+3.8 mAP on balanced datasets) while achieving 6.8× parameter efficiency compared to expert models, validating a practical path toward autonomous, explainable RS scene understanding.

Bookmark

View Full Paper

Cite This Study

Qu et al. (Thu,) studied this question.

synapsesocial.com/papers/698828fd0fc35cd7a8848ee2 https://doi.org/https://doi.org/10.3390/rs18030518

Bookmark

View Full Paper