This paper documents a composite adversarial technique — Affective Prompt Injection via False Context Priming — in which locally deployed large language models are manipulated through a coordinated chain of conversational moves rather than a single adversarial prompt. The technique combines fabricated session history, emotionally amplified language, continuity framing, and encoded payload delivery to produce compliance drift: a staged erosion of a model's safety posture across successive conversation turns. Empirical testing was conducted on two models in separate environments — MiniMax M2.5 via Ollama on Kali Linux 2026.1, and Gemma3:1b via Ollama on Windows 11. Both models produced offensive security framework code following the attack chain. Two failure modes not documented in prior literature were observed: thinking-layer override, in which a chain-of-thought model's internal safety reasoning is contradicted by its own final response; and confabulatory context fabrication, in which the model generates detailed but fictitious project history to maintain narrative coherence with an adversary-supplied false premise. A binary-encoded payload delivery bypass was separately confirmed, circumventing keywordbased input filters through CyberChef 8-bit encoding. The three structural properties exploited — statelessness, RLHF helpfulness conditioning, and narrative coherence preference — are present in RLHF-trained instruction-tuned models broadly. Additional case studies on Qwen2.5:7b, Deepseek V3.1:671b-cloud, and Deepseek R1 in both Linux and Windows environments produced qualitatively similar end states under the same multi-stage attack chain. These results strengthen the hypothesis that the observed failure modes are structurally grounded, but systematic evaluation across a wider model set remains future work.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sree Rahul Mukkavalli
Building similarity graph...
Analyzing shared references across papers
Loading...
Sree Rahul Mukkavalli (Tue,) studied this question.
www.synapsesocial.com/papers/69d894ce6c1944d70ce05bbe — DOI: https://doi.org/10.5281/zenodo.19462518
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: