What question did this study set out to answer?

This research aims to improve the robustness of foundation models against adversarial attacks by exploiting their non-transferability.

April 19, 2026Open Access

Leveraging Attack Non-Transferability to Boost Adversarial Robustness for Foundation Models

Key Points

This research aims to improve the robustness of foundation models against adversarial attacks by exploiting their non-transferability.
Developed a novel adversarial defense framework utilizing non-transferability.
Constructed a common embedding space for multi-modal foundation models.
Introduced a detection scheme based on feature distances to identify attack targets.
Adaptively switched prediction outputs to mitigate attacks.
Demonstrated improved adversarial robustness compared to standard fine-tuning methods.
Outperformed state-of-the-art adversarial defenses in experimental evaluations.

Abstract

This paper presents a novel adversarial defense framework that strategically exploits the non-transferability of adversarial attacks across multi-modal foundation models. While Contrastive Language–Image Pre-training (CLIP) models demonstrate remarkable zero-shot capabilities, they remain vulnerable to adversarial samples. Adversarial fine-tuning is widely adopted as a standard defense, yet the resulting robustness against sophisticated white-box attacks is often insufficient. To address this limitation, we aim to boost the robustness of an adversarially fine-tuned model by utilizing a pre-trained auxiliary model to leverage attack non-transferability. Specifically, we construct a common embedding space and introduce a detection scheme that identifies the attack target based on feature distances. By adaptively switching the prediction output, we effectively mitigate attacks. Experimental results demonstrate that our approach outperforms state-of-the-art adversarial fine-tuning methods in terms of adversarial robustness.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Koshiro Toishi

Keisuke Maeda

Ren Togo

Journals

Applied Sciences

Actions

Institutions

Hokkaido University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Leveraging Attack Non-Transferability to Boost Adversarial Robustness for Foundation Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study