Key points are not available for this paper at this time.
Although Large Language Models (LLMs) have demonstrated significant capabilities in executing complex tasks in a zero-shot manner, they are susceptible to jailbreak attacks and can be manipulated to produce harmful outputs. Recently, a growing body of research has categorized jailbreak attacks into token-level and prompt-level attacks. However, previous work primarily overlooks the diverse key factors of jailbreak attacks, with most studies concentrating on LLM vulnerabilities and lacking exploration of defense-enhanced LLMs. To address these issues, we evaluate the impact of various attack settings on LLM performance and provide a baseline benchmark for jailbreak attacks, encouraging the adoption of a standardized evaluation framework. Specifically, we evaluate the eight key factors of implementing jailbreak attacks on LLMs from both target-level and attack-level perspectives. We further conduct seven representative jailbreak attacks on six defense methods across two widely used datasets, encompassing approximately 320 experiments with about 50, 000 GPU hours on A800-80G. Our experimental results highlight the need for standardized benchmarking to evaluate these attacks on defense-enhanced LLMs. Our code is available at https: //github. com/usail-hkust/BagₒfTricksforLLMJailbreaking.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xu et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68e64f88b6db6435875e0181 — DOI: https://doi.org/10.48550/arxiv.2406.09324
Zhao Xu
Fan Liu
Hao Liu
Building similarity graph...
Analyzing shared references across papers
Loading...