January 1, 2025Open Access

MSAG: Semantic Enhancement and Image Generation Based on Multimodal Large Models for Supporting Few‐Shot Learning

Key Points

Improving classification accuracy across 15 benchmark datasets demonstrates MSAG's effectiveness in few-shot learning.
The multimodal semantic enhancement module integrates visual and semantic data to generate better category descriptions.
Incorporating a multimodal image generation module utilizes text-to-image capabilities to create diverse training images.
This approach highlights a novel integration of visual-semantic relationships to mitigate issues in traditional few-shot learning.

Abstract

ABSTRACT Currently, existing few‐shot learning methods encounter significant bottlenecks in semantic enhancement and data augmentation. Traditional prompt templates are limited by their fixed format, making it difficult to fully capture the characteristics of categories. On the other hand, category description methods based on large language models are susceptible to category polysemy, which can result in semantic bias. Therefore, we propose a multimodal semantic enhancement (MSE) module, which jointly analyzes the visual‐semantic relationship between category names and example samples through a multimodal large model. By leveraging visual information to guide the generation of discriminative category descriptions, MSE effectively mitigates semantic polysemy issues. To mitigate the issue of insufficient support set data, we introduce a multimodal image generation (MIG) module, which utilizes the image generation capability of text‐to‐image models and generates diverse images based on various textual information. Additionally, we draw inspiration from the prototypical networks and combine it with gaussian discriminant analysis to build a training‐free visual‐textual classifier. Our method (MSAG) significantly improves classification accuracy across 15 benchmark datasets, validating the effectiveness of the multimodal information collaborative enhancement strategy in alleviating the problem of data scarcity.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jia Zhao

Zhang Cao

Huiling Wang

Journals

IET Image Processing

Actions

Institutions

Fuyang Normal University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

MSAG: Semantic Enhancement and Image Generation Based on Multimodal Large Models for Supporting Few‐Shot Learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider