This paper documents a composite adversarial technique — Affective Prompt Injection via False Context Priming — in which locally deployed large language models are manipulated through a coordinated chain of conversational moves rather than a single adversarial prompt. The technique combines fabricated session history, emotionally amplified language, continuity framing, and encoded payload delivery to produce compliance drift: a staged erosion of a model's safety posture across successive conversation turns. Empirical testing was conducted on two models in separate environments — MiniMax M2.5 via Ollama on Kali Linux 2026.1, and Gemma3:1b via Ollama on Windows 11. Both models produced offensive security framework code following the attack chain. Two failure modes not documented in prior literature were observed: thinking-layer override, in which a chain-of-thought model's internal safety reasoning is contradicted by its own final response; and confabulatory context fabrication, in which the model generates detailed but fictitious project history to maintain narrative coherence with an adversary-supplied false premise. A binary-encoded payload delivery bypass was separately confirmed, circumventing keywordbased input filters through CyberChef 8-bit encoding. The three structural properties exploited — statelessness, RLHF helpfulness conditioning, and narrative coherence preference — are present in RLHF-trained instruction-tuned models broadly. Additional case studies on Qwen2.5:7b, Deepseek V3.1:671b-cloud, and Deepseek R1 in both Linux and Windows environments produced qualitatively similar end states under the same multi-stage attack chain. These results strengthen the hypothesis that the observed failure modes are structurally grounded, but systematic evaluation across a wider model set remains future work.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sree Rahul Mukkavalli (Tue,) studied this question.
www.synapsesocial.com/papers/69d894ce6c1944d70ce05bbe — DOI: https://doi.org/10.5281/zenodo.19462518
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Sree Rahul Mukkavalli
Building similarity graph...
Analyzing shared references across papers
Loading...