March 3, 2026

AttriPrompt: Class Attribute-Aware Prompt Tuning for Vision-Language Model

Key Points

Achieves superior performance, especially for tail classes in imbalanced scenarios, highlighting the significance of tailored prompts.
Key evidence shows enhancements in class semantics and generalization through custom prompts based on class attributes.
The proposed framework utilizes an attribute pool to model essential class attributes for better prompt tuning.
This methodology supports further exploration of class characteristics, emphasizing the need for targeted approaches in machine learning.

Abstract

Prompt tuning has proven to be an effective alternative for fine-tuning the pre-trained vision-language models (VLMs) to downstream tasks. Among existing approaches, class-shared prompts learn a unified prompt shared across all classes, while sample-specific prompts generate distinct prompts tailored to each individual sample. However, both approaches often struggle to adequately capture the unique characteristics of underrepresented classes, particularly in imbalanced scenarios where data for tail classes is scarce. To alleviate this issue, we propose an attribute-aware prompt tuning framework that prompts a more balanced understanding for imbalance tasks by explicitly modeling critical class-level attributes. The key intuition is that, from the perspective of class, essential attributes tend to be relatively consistent across classes, regardless of sample sizes. Specifically, we build an attribute pool to learn potential semantic attributes of classes based on VLMs. For each input sample, we generate a unique attribute-aware prompt by selecting the relevant class attributes from the pool through a matching mechanism. This design enables the model to capture essential class semantics and generate informative prompts, even for classes with limited data. Additionally, we introduce a ProAdapter module to facilitate the transfer of foundational knowledge from VLMs while enhancing generalization to underrepresented classes in imbalanced settings. Extensive experiments on standard and imbalance few-shot tasks demonstrate that our model achieves superior performance especially in tail classes.

Bookmark

Cite This Study

Su et al. (Thu,) studied this question.

synapsesocial.com/papers/69a75c6bc6e9836116a25493 https://doi.org/https://doi.org/10.1109/tip.2026.3657216

Bookmark