Exploratory empirical study testing whether embedded instructions in documents can hijack AI summarisation workflows, and whether the vulnerability is predictable from model capability. Three documents (one honest control, two fabricated pharmaceutical papers using different rhetorical registers) were processed by seventeen model configurations from three providers across approximately 350 runs. A subsequent controlled ablation (~251 runs) isolated register and addressivity effects. Compliance did not track capability tiers, model generations, or reasoning affordances. The two malicious documents produced different failure pathways: care-framed compliance persisted through credibility collapse while authority-framed compliance did not. The paper introduces a compliance taxonomy, proposes a value activation hypothesis for care framing persistence, analyses thinking-mode amplification of cognitive biases, and situates the findings against a joint OpenAI/Anthropic/DeepMind study demonstrating that all twelve published defences tested were bypassed at over 90% under adaptive conditions.Paper 1 of 5 in the Confidence Curriculum series 10.5281/zenodo.19226032.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ivan "HiP" Phan (Tue,) studied this question.
www.synapsesocial.com/papers/69cf5f425a333a821460e4a8 — DOI: https://doi.org/10.5281/zenodo.19365459
Ivan "HiP" Phan
Building similarity graph...
Analyzing shared references across papers
Loading...