Adapting large VLMs to specialized, long-tailed domains requires a careful balance between performance and the preservation of pretrained knowledge. Although full parameter fine-tuning is powerful, it is resource-intensive and can easily overfit on imbalanced data. We propose Adaptive Progressive Fine-Tuning (APFT), a strategy that automates this complex process. APFT employs a staged layer unfreezing process guided by an event-triggered mechanism; instead of relying on a fixed schedule, phase transitions are automatically initiated based on real-time training stability metrics like loss volatility and performance plateaus. Upon transition, a cosine annealing scheduler is re-initialized, and weight decay is adaptively increased to regularize the newly trainable parameters. Experiments on the long-tailed HISTORY-X4 archival dataset indicate that APFT significantly outperforms all baselines, including full fine-tuning and LoRA. The advantage is most pronounced on tailed labels, where our APFT method achieves a 19. 9 \% relative improvement in text-to-image m A P @ 10 over the strongest baseline, demonstrating its ability to effectively adapt to new domains while preserving foundational knowledge.
Alijani et al. (Mon,) studied this question.