What question did this study set out to answer?

The research aims to develop a method that enhances the robustness of scene text recognition (STR) models against adversarial attacks.

February 28, 2026Open Access

Beyond single image: modified saliency aggregation for universal attacks on STR models

Key Points

The research aims to develop a method that enhances the robustness of scene text recognition (STR) models against adversarial attacks.
Introduced a novel universal adversarial attack method for STR models.
Used a modified saliency map approach.
Aggregated saliency information across multiple samples to generate universal adversarial perturbations.
Achieved a high attack success rate against STR models.
Exhibited strong transferability of attacks across different STR models.
Effectively worked in a black-box setting, overcoming limitations of existing methods.

Abstract

In the era of deep learning, Scene Text Recognition (STR) has become a vital technology for extracting textual information from natural images, with broad practical applications (e.g., autonomous driving). Nevertheless, STR models remain susceptible to adversarial attacks, which could be used to help enhance the robustness of STR models. Current attack strategies for STR typically generate adversarial examples on a per-image basis, often requiring too many queries or the full knowledge of the target model. From a practical standpoint, the implementation of these existing methods poses significant challenges. To solve the above-mentioned problems, this paper introduces a novel universal adversarial attack method designed for STR models. By incorporating a modified saliency map approach and aggregating saliency information across multiple samples, we generate universal adversarial perturbations (UAP) that effectively deceive STR models. Experimental results show that the proposed method achieves a high attack success rate and exhibits strong transferability across different models, even in a black-box setting.

Mark Helpful

Bookmark

Relay

View Full Paper