What question did this study set out to answer?

The aim is to investigate how jailbreak prompts can exploit LLMs to generate harmful responses.

April 12, 2026Open Access

RogueGPT : Unleashing Jailbreak Prompts on LLMs

Key Points

The aim is to investigate how jailbreak prompts can exploit LLMs to generate harmful responses.
Examine various LLMs like ChatGPT, Gemini, LLaMa, Vicuna, and Qwen.
Develop and categorize jailbreak prompts to test their effectiveness.
Calculate the Attack-to-Success Rate (ASR) for each LLM.
Perform human assessments alongside quantitative analyses.
Vicuna achieved the highest ASR of 0.93 and FT of 0.842.
LLaMa followed with ASR of 0.71 and FT of 0.709.
The false information category had the highest overall ASR of 0.864 and FT of 0.96.
Findings highlight the vulnerabilities of popular LLMs to manipulation.

Abstract

ABSTRACT Large Language Models (LLMs) have seen a remarkable surge in popularity since the latter part of 2022. These models have become vital in the lives of individuals from varying professions. While some users leverage LLMs for academic or informational purposes, others exploit them for illicit activities. Methods of exploitation include Adversarial Attacks, Instruction Tuning Attacks, Inference Attacks, and Extraction Attacks. This paper investigates a specific Instruction Tuning Attack known as jailbreaking, which manipulates LLMs with prompts to generate harmful responses to forbidden instructions. This study presents compelling evidence of how widely used LLMs, such as OpenAI's ChatGPT, Google's Gemini, Meta's LLaMa, LMSYS's Vicuna, and Alibaba Cloud's Qwen, can be manipulated to generate responses that range from mildly illegal to potentially criminal content. Jailbreak prompts were created for each LLM, encompassing a range of inquiries spanning various categories. Based on the level of response elicited, they were categorized and computed alongside the Attack‐to‐Success Rate (ASR). These findings highlight the effectiveness of our prompts on each LLM and their performance relative to other models. Vicuna produced the best results with ASR (0.93) and FT (0.842), followed by LLaMa with ASR (0.71) and FT (0.709), indicating their vulnerability. The category of False Information had the highest overall average, with ASR (0.864) and FT (0.96). Our conclusions were reached through a combination of human assessment and quantitative analysis, detailed in subsequent sections. Through the dissemination of this research, the aim is to encourage organizations to prioritize their security measures and raise awareness among individuals about the responsible and ethical use of LLMs, given their potential for harm.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Arpitha Shivaswaroopa

Vanshika Sood

H. L. Gururaj

Journals

Engineering Reports

Actions

Institutions

Manipal Academy of Higher Education

Near East University

Artificial Intelligence in Medicine (Canada)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

RogueGPT : Unleashing Jailbreak Prompts on LLMs

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study