What question did this study set out to answer?

The research assesses whether large language models can effectively navigate ethical dilemmas based on contextual cues.

March 12, 2026Open Access

Contextual Blindness in Large Language Models - Systematic Vulnerabilities to Ethical Duality Manipulation

Key Points

The research assesses whether large language models can effectively navigate ethical dilemmas based on contextual cues.
Controlled experiments across five AI models (Claude Sonnet 4.5, GPT 5.2, Gemini 3, Grok 4, Mistral)
Single-turn atomic prompts and multi-turn conversational protocols were used
Analysis of unsafe response rates to manipulated financial scenarios
78.9% unsafe response rate across 350 single-turn trials
Claude Sonnet 4.5 showed the most resistance with a 34.3% unsafe rate
Multi-turn protocols significantly lowered refusal rates, dropping to 20% for Claude compared to 0% for others

Abstract

Large language models are deployed across domains requiring nuanced contextual judgment: financial services, healthcare, legal consultation. Yet these systems confront a fundamental epistemological constraint: they process semantic patterns withoutaccess to the verificatory infrastructure enabling humans to distinguish legitimate authority from its mere assertion. This paperinterrogates whether frontier AI models possess the contextual reasoning capabilities necessary to navigate ethical duality:instances wherein structurally isomorphic scenarios diverge radically in moral valence based exclusively on context. Throughcontrolled experimentation across five frontier models (Claude Sonnet 4.5, GPT 5.2, Gemini 3, Grok 4, and Mistral), wedemonstrate systematic vulnerability to contextual manipulation in the domain of financial fraud. Employing both single-turn atomic prompts and multi-turn conversational protocols, we present models with structurally identical financial schemes framedthrough varying institutional contexts. Across 350 single-turn trials, aggregated unsafe response rates reached 78.9% (95% CI 0.74, 0.83), with only Claude Sonnet 4.5 demonstrating substantive resistance (34.3% unsafe rate). This resistance derives fromconservative safety defaults rather than contextual discernment, as evidenced by false positives on benign content and failures when fraudulent schemes invoked different framings. Multi-turn fragmentation protocols reveal more severe vulnerabilities. OurFROST (Fraud Research Operationalisation & Systematic Testing) methodology demonstrates that distributing harmful requestcomponents across conversational turns degraded Claude's refusal rate from 64.3% to 20%, while GPT 5.2 and Gemini 3 exhibited 0% refusal rates, generating comprehensive fraud infrastructure.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Alessandro Marci (Mon,) studied this question.

www.synapsesocial.com/papers/69b25aab96eeacc4fcec89b6 — DOI: https://doi.org/10.5281/zenodo.18923286

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Contextual Blindness in Large Language Models - Systematic Vulnerabilities to Ethical Duality Manipulation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion