In the era of deep learning, Scene Text Recognition (STR) has become a vital technology for extracting textual information from natural images, with broad practical applications (e.g., autonomous driving). Nevertheless, STR models remain susceptible to adversarial attacks, which could be used to help enhance the robustness of STR models. Current attack strategies for STR typically generate adversarial examples on a per-image basis, often requiring too many queries or the full knowledge of the target model. From a practical standpoint, the implementation of these existing methods poses significant challenges. To solve the above-mentioned problems, this paper introduces a novel universal adversarial attack method designed for STR models. By incorporating a modified saliency map approach and aggregating saliency information across multiple samples, we generate universal adversarial perturbations (UAP) that effectively deceive STR models. Experimental results show that the proposed method achieves a high attack success rate and exhibits strong transferability across different models, even in a black-box setting.
Xu et al. (Wed,) studied this question.