June 13, 2024Open Access

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

Key Points

Key points are not available for this paper at this time.

Abstract

Although Large Language Models (LLMs) have demonstrated significant capabilities in executing complex tasks in a zero-shot manner, they are susceptible to jailbreak attacks and can be manipulated to produce harmful outputs. Recently, a growing body of research has categorized jailbreak attacks into token-level and prompt-level attacks. However, previous work primarily overlooks the diverse key factors of jailbreak attacks, with most studies concentrating on LLM vulnerabilities and lacking exploration of defense-enhanced LLMs. To address these issues, we evaluate the impact of various attack settings on LLM performance and provide a baseline benchmark for jailbreak attacks, encouraging the adoption of a standardized evaluation framework. Specifically, we evaluate the eight key factors of implementing jailbreak attacks on LLMs from both target-level and attack-level perspectives. We further conduct seven representative jailbreak attacks on six defense methods across two widely used datasets, encompassing approximately 320 experiments with about 50, 000 GPU hours on A800-80G. Our experimental results highlight the need for standardized benchmarking to evaluate these attacks on defense-enhanced LLMs. Our code is available at https: //github. com/usail-hkust/BagₒfTricksforLLMJailbreaking.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Xu et al. (Thu,) studied this question.

www.synapsesocial.com/papers/68e64f88b6db6435875e0181 — DOI: https://doi.org/10.48550/arxiv.2406.09324

Authors

Zhao Xu

Fan Liu

Hao Liu

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion