ABSTRACT Large Language Models (LLMs) have seen a remarkable surge in popularity since the latter part of 2022. These models have become vital in the lives of individuals from varying professions. While some users leverage LLMs for academic or informational purposes, others exploit them for illicit activities. Methods of exploitation include Adversarial Attacks, Instruction Tuning Attacks, Inference Attacks, and Extraction Attacks. This paper investigates a specific Instruction Tuning Attack known as jailbreaking, which manipulates LLMs with prompts to generate harmful responses to forbidden instructions. This study presents compelling evidence of how widely used LLMs, such as OpenAI's ChatGPT, Google's Gemini, Meta's LLaMa, LMSYS's Vicuna, and Alibaba Cloud's Qwen, can be manipulated to generate responses that range from mildly illegal to potentially criminal content. Jailbreak prompts were created for each LLM, encompassing a range of inquiries spanning various categories. Based on the level of response elicited, they were categorized and computed alongside the Attack‐to‐Success Rate (ASR). These findings highlight the effectiveness of our prompts on each LLM and their performance relative to other models. Vicuna produced the best results with ASR (0.93) and FT (0.842), followed by LLaMa with ASR (0.71) and FT (0.709), indicating their vulnerability. The category of False Information had the highest overall average, with ASR (0.864) and FT (0.96). Our conclusions were reached through a combination of human assessment and quantitative analysis, detailed in subsequent sections. Through the dissemination of this research, the aim is to encourage organizations to prioritize their security measures and raise awareness among individuals about the responsible and ethical use of LLMs, given their potential for harm.
Building similarity graph...
Analyzing shared references across papers
Loading...
Arpitha Shivaswaroopa
Vanshika Sood
H. L. Gururaj
Engineering Reports
Manipal Academy of Higher Education
Near East University
Artificial Intelligence in Medicine (Canada)
Building similarity graph...
Analyzing shared references across papers
Loading...
Shivaswaroopa et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69db36e64fe01fead37c4e0f — DOI: https://doi.org/10.1002/eng2.70069