Textual content faces escalating security threats regarding copyright infringement, tampering, and unauthorized distribution. Text watermarking offers a vital defense mechanism by embedding imperceptible identifiers for source tracking and anti-counterfeiting. However, unlike general image watermarking, protecting text is uniquely challenging due to its highly discrete structure and low pixel redundancy, where even minute perturbations can compromise legibility. Over the past three decades, a wide range of text watermarking techniques have been proposed to address these challenges. While recent research has heavily favored semantic-based watermarking driven by Large Language Models (LLMs), these approaches are often inapplicable to high-stakes scenarios requiring strict content integrity and visual fidelity, such as legal documentation and artistic font protection. Addressing this gap, this paper presents a comprehensive survey of semantic-preserving text watermarking methods developed in recent years, with a particular focus on image-based, font-based, and format-based techniques. We propose a unified classification framework to systematically analyze these approaches, examining their methodological principles, robustness, embedding capacity, and imperceptibility. By clarifying the core characteristics and limitations of existing techniques, this survey aims to provide a structured technical reference for researchers and practitioners, facilitating the advancement of secure, robust, and scalable text protection technologies.
Meng et al. (Sat,) studied this question.