What type of study is this?

September 10, 2025Open Access

A Multi-Task Learning Framework Based on CLIP and Adapter Modules

Key Points

Our approach achieves up to 12% performance improvement while adding less than 0.2% parameters.
By using lightweight adapter modules, the model maintains CLIP's original zero-shot capabilities.
This framework facilitates adaptation across classification, image-text retrieval, and regression tasks.
It provides significant advantages over conventional transfer strategies, enhancing task generalization.

Abstract

In recent years, with the rapid development of cross-modal learning, pretrained models such as CLIP have demonstrated powerful zero-shot capabilities in image-text alignment tasks, making them central to multimodal research. However, a key challenge remains: how to effectively transfer these capabilities while preserving the strengths of CLIP. To address this, we propose a parameter-efficient multi-task fine-tuning frameworkMulti-Task CLIP-Adapter. By inserting lightweight Adapter modules after the frozen CLIP encoder, our method enables unified adaptation across multiple tasks, including classification, image-text retrieval, and regression. Experimental results show that our approach achieves an 8%12% performance improvement with less than 0.2% additional parameters, while maintaining the original models zero-shot capability. Compared to the original CLIP and conventional transfer strategies, the Multi-Task CLIP-Adapter offers significant advantages in parameter efficiency and task generalization, paving a new path for scalable applications of large multimodal models.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Ji Han (Wed,) studied this question.

www.synapsesocial.com/papers/68c183f09b7b07f3a060f830 — DOI: https://doi.org/10.54254/2755-2721/2025.bj26532

Authors

Ji Han

Journals

Applied and Computational Engineering

Actions

Institutions

Harbin University of Science and Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Multi-Task Learning Framework Based on CLIP and Adapter Modules

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion