This paper addresses the important and challenging task of large-scale unsupervised semantic segmentation (LUSS). We present the first attempt to unleash the power of foundation models (FMs) for the challenging, dense prediction task LUSS, and our main objective is to present simple, effective yet efficient solutions for LUSS, namely Prompting foundation models for LUSS (PLUSS). Firstly, we proposed a cascade framework PLUSS\_ by effectively marrying CLIPS, Grounding DINO, and SAM in a zero-shot manner. This cascade architecture automatically generates semantic and spatial prompts for SAM, establishing a strong baseline that significantly outperforms previous state-of-the-art methods. Building upon this foundation, we propose PLUSS\_, which addresses the critical bottleneck of prompt quality through two novel tuner modules: a semantic tuner that enhances fine-grained category discrimination via visual prompt tuning, and a box tuner that improves object localization through cross-modal feature fusion. Both tuners are optimized by capitalizing on the knowledge already present within the foundation models themselves, deriving self-supervised signals from internal model consistency. This approach requires no external supervision or updates to the foundation models' parameters. Extensive experiments on ImageNet-S benchmarks demonstrate that PLUSS\_ achieves remarkable performance improvements, surpassing the previous best method by 39. 6%, 27. 3%, and 22. 6% in mIoU for 50, 300, and 919 categories respectively. Our approach exhibits robust category-shape representation across varying object sizes and dataset scales, while maintaining strong generalization capabilities for open-vocabulary tasks. The proposed framework provides a solid baseline for adapting foundation models to downstream vision tasks. Code is available at https: //github. com/Miss-Jo/PLUSSbeta.
Building similarity graph...
Analyzing shared references across papers
Loading...
Su et al. (Thu,) studied this question.
synapsesocial.com/papers/69ca134b883daed6ee095359 — DOI: https://doi.org/10.1109/tpami.2026.3673339
Jiaojiao Su
Central South University
Qin Luo
Chinese Academy of Medical Sciences & Peking Union Medical College
Shuzhou Sun
National University of Defense Technology
IEEE Transactions on Pattern Analysis and Machine Intelligence
University of Oulu
Central South University
National University of Defense Technology
Building similarity graph...
Analyzing shared references across papers
Loading...