ABSTRACT Currently, existing few‐shot learning methods encounter significant bottlenecks in semantic enhancement and data augmentation. Traditional prompt templates are limited by their fixed format, making it difficult to fully capture the characteristics of categories. On the other hand, category description methods based on large language models are susceptible to category polysemy, which can result in semantic bias. Therefore, we propose a multimodal semantic enhancement (MSE) module, which jointly analyzes the visual‐semantic relationship between category names and example samples through a multimodal large model. By leveraging visual information to guide the generation of discriminative category descriptions, MSE effectively mitigates semantic polysemy issues. To mitigate the issue of insufficient support set data, we introduce a multimodal image generation (MIG) module, which utilizes the image generation capability of text‐to‐image models and generates diverse images based on various textual information. Additionally, we draw inspiration from the prototypical networks and combine it with gaussian discriminant analysis to build a training‐free visual‐textual classifier. Our method (MSAG) significantly improves classification accuracy across 15 benchmark datasets, validating the effectiveness of the multimodal information collaborative enhancement strategy in alleviating the problem of data scarcity.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jia Zhao
Zhang Cao
Huiling Wang
IET Image Processing
Fuyang Normal University
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhao et al. (Wed,) studied this question.
www.synapsesocial.com/papers/68af5f07ad7bf08b1eae1601 — DOI: https://doi.org/10.1049/ipr2.70189
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: