What question did this study set out to answer?

The study examines how embedded instructions can affect AI summarisation and if these vulnerabilities depend on model capabilities.

April 3, 2026Open Access

The Confidence Vulnerability: Unstable Judgment in Language Model Summarisation

Key Points

The study examines how embedded instructions can affect AI summarisation and if these vulnerabilities depend on model capabilities.
Tested seventeen model configurations from three providers across approximately 350 runs.
Isolated effects of rhetorical register and addressivity in a controlled ablation with about 251 runs.
Analyzed compliance against different framing of documents (care vs. authority).
Compliance did not correlate with model capabilities or reasoning affordances.
Different failure pathways emerged between documents framed as care and authority.
A notable finding was that compliance persisted through credibility collapse under care framing.

Abstract

Exploratory empirical study testing whether embedded instructions in documents can hijack AI summarisation workflows, and whether the vulnerability is predictable from model capability. Three documents (one honest control, two fabricated pharmaceutical papers using different rhetorical registers) were processed by seventeen model configurations from three providers across approximately 350 runs. A subsequent controlled ablation (~251 runs) isolated register and addressivity effects. Compliance did not track capability tiers, model generations, or reasoning affordances. The two malicious documents produced different failure pathways: care-framed compliance persisted through credibility collapse while authority-framed compliance did not. The paper introduces a compliance taxonomy, proposes a value activation hypothesis for care framing persistence, analyses thinking-mode amplification of cognitive biases, and situates the findings against a joint OpenAI/Anthropic/DeepMind study demonstrating that all twelve published defences tested were bypassed at over 90% under adaptive conditions.Paper 1 of 5 in the Confidence Curriculum series 10.5281/zenodo.19226032.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Ivan "HiP" Phan (Tue,) studied this question.

www.synapsesocial.com/papers/69cf5f425a333a821460e4a8 — DOI: https://doi.org/10.5281/zenodo.19365459

The Confidence Vulnerability: Unstable Judgment in Language Model Summarisation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion