March 31, 2024Open Access

Lipsum-FT: Robust Fine-Tuning of Zero-Shot Models Using Random Text Guidance

Key Points

Key points are not available for this paper at this time.

Abstract

Large-scale contrastive vision-language pre-trained models provide the zero-shot model achieving competitive performance across a range of image classification tasks without requiring training on downstream data. Recent works have confirmed that while additional fine-tuning of the zero-shot model on the reference data results in enhanced downstream performance, it compromises the model's robustness against distribution shifts. Our investigation begins by examining the conditions required to achieve the goals of robust fine-tuning, employing descriptions based on feature distortion theory and joint energy-based models. Subsequently, we propose a novel robust fine-tuning algorithm, Lipsum-FT, that effectively utilizes the language modeling aspect of the vision-language pre-trained models. Extensive experiments conducted on distribution shift scenarios in DomainNet and ImageNet confirm the superiority of our proposed Lipsum-FT approach over existing robust fine-tuning methods.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Nam et al. (Sun,) studied this question.

www.synapsesocial.com/papers/68e718ecb6db643587692396 — DOI: https://doi.org/10.48550/arxiv.2404.00860

Authors

Giung Nam

Byeongho Heo

Ju Ho Lee

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Lipsum-FT: Robust Fine-Tuning of Zero-Shot Models Using Random Text Guidance

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider