April 7, 2024Open Access

Hidden You Malicious Goal Into Benigh Narratives: Jailbreak Large Language Models through Logic Chain Injection

Key Points

Key points are not available for this paper at this time.

Abstract

Jailbreak attacks on Language Model Models (LLMs) entail crafting prompts aimed at exploiting the models to generate malicious content. Existing jailbreak attacks can successfully deceive the LLMs, however they cannot deceive the human. This paper proposes a new type of jailbreak attacks which can deceive both the LLMs and human (i.e., security analyst). The key insight of our idea is borrowed from the social psychology - that is human are easily deceived if the lie is hidden in truth. Based on this insight, we proposed the logic-chain injection attacks to inject malicious intention into benign truth. Logic-chain injection attack firstly dissembles its malicious target into a chain of benign narrations, and then distribute narrations into a related benign article, with undoubted facts. In this way, newly generate prompt cannot only deceive the LLMs, but also deceive human.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Wang et al. (Sun,) studied this question.

www.synapsesocial.com/papers/68e701fab6db64358767c041 — DOI: https://doi.org/10.48550/arxiv.2404.04849

Authors

Zhilong Wang

Yebo Cao

Peng Liu

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Hidden You Malicious Goal Into Benigh Narratives: Jailbreak Large Language Models through Logic Chain Injection

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion