Generative LLMs become increasingly powerful. Several detectors have been proposed for distinguishing between AI-generated and human-written text with the goal of protecting text authenticity and integrity. A major challenge in zero-shot Generated Text Detection is the so-called ``capybara problem'', where missing context causes detectors to misclassify unusual but contextually explainable linguistic features as human-written. To alleviate this issue, this paper proposes a detector-agnostic method that provides prompt context through prompt inversion from an auxiliary LLM. By filtering out contextual linguistic features, the approach enables detectors to focus on stylistic cues indicative of generated text. Experiments on a diverse dataset including multiple domains, LLMs, and adversarial manipulations show that incorporating prompt context improves detection performances by up to 5 % in AUC. Further evaluations on attack robustness and domain generalization show that AUC performance increases up to 10 % for adversarially manipulated samples and up to 6 % in domain generalization accuracy, underscoring the effectiveness of prompt context in enhancing generated text detection.
Building similarity graph...
Analyzing shared references across papers
Loading...
Philipp Dingfelder
Julia Hoffmann
Christian Riess
Building similarity graph...
Analyzing shared references across papers
Loading...
Dingfelder et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69b6068883145bc643d1c8ea — DOI: https://doi.org/10.18420/sicherheit2026_14