What question did this study set out to answer?

To improve the performance of vision-language models in multi-domain task incremental learning while preserving zero-shot recognition capability.

December 21, 2025Open Access

Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models

Key Points

To improve the performance of vision-language models in multi-domain task incremental learning while preserving zero-shot recognition capability.
Proposed AFA framework with two modules: against forward-forgetting adapter and against backward-forgetting adapter.
Implemented experiments to compare AFA with existing state-of-the-art approaches in few-shot and zero-shot tasks.
AFA significantly outperform existing methods in few-shot multi-domain task incremental learning.
Demonstrated improved transferability over the inherent zero-shot performance of CLIP.

Abstract

This study aims to address the problem of multi-domain task incremental learning~(MTIL), which requires that vision-language models~(VLMs) continuously acquire new knowledge while maintaining their inherent zero-shot recognition capability. Existing paradigms delegate the testing of unseen-domain samples to the original CLIP, which only prevents the degradation of the model's zero-shot capability but fails to enhance the generalization of the VLM further. To this end, we propose a novel MTIL framework, named AFA, which comprises two core modules: (1) an against forward-forgetting adapter that learns task-invariant information for each dataset in the incremental tasks to enhance the zero-shot recognition ability of VLMs; (2) an against backward-forgetting adapter that strengthens the few-shot learning capability of VLMs while supporting incremental learning. Extensive experiments demonstrate that the AFA method significantly outperforms existing state-of-the-art approaches, especially in few-shot MTIL tasks, and surpasses the inherent zero-shot performance of CLIP in terms of transferability. The code is provided in the Supplementary Material.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Dong et al. (Mon,) studied this question.

www.synapsesocial.com/papers/69473b64db9c958d0dfca9e3 — DOI: https://doi.org/10.48550/arxiv.2505.07690

Authors

Songlin Dong

Chris Ding

Jiangyang Li

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion