May 30, 2024Open Access

SLM as Guardian: Pioneering AI Safety with Small Language Models

Key Points

Key points are not available for this paper at this time.

Abstract

Most prior safety research of large language models (LLMs) has focused on enhancing the alignment of LLMs to better suit the safety requirements of humans. However, internalizing such safeguard features into larger models brought challenges of higher training cost and unintended degradation of helpfulness. To overcome such challenges, a modular approach employing a smaller LLM to detect harmful user queries is regarded as a convenient solution in designing LLM-based system with safety requirements. In this paper, we leverage a smaller LLM for both harmful query detection and safeguard response generation. We introduce our safety requirements and the taxonomy of harmfulness categories, and then propose a multi-task learning mechanism fusing the two tasks into a single model. We demonstrate the effectiveness of our approach, providing on par or surpassing harmful query detection and safeguard response performance compared to the publicly available LLMs.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ohjoon Kwon

Donghyeon Jeon

Nayoung Choi

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

SLM as Guardian: Pioneering AI Safety with Small Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider