March 3, 2026Open Access

BeaverTails-IT: Towards a Safety Benchmark for Evaluating Italian Large Language Models

Key Points

The benchmark offers a preliminary evaluation of Italian LLMs focusing on key safety aspects such as toxicity and bias.
Automated metrics and human judgments were used to assess the quality of translations from English to Italian.
Five advanced translation models were utilized to create the BeaverTails-IT dataset, enabling localized evaluation.
Highlights the necessity for an Italian-specific safety benchmark, addressing challenges in using translated content.

Abstract

Large Language Models (LLMs) have achieved remarkable success in generating human-like text and are increasingly integrated into real-world applications. However, their deployment raises significant safety concerns, including the risk of generating harmful, biased, or culturally inappropriate content. While several safety benchmarks exist for English, non-English contexts—such as Italian—remain critically underexplored, despite the growing demand for localized and culturally sensitive AI technologies. In this paper, we introduce BeaverTails-IT, the first Italian safety benchmark for LLMs, created through the machine translation of the original English BeaverTails dataset. We employ five state-of-the-art translation models, evaluate translation quality using automated metrics and human judgments, and provide guidelines for selecting high-quality safety prompts. Our benchmark enables the preliminary evaluation of Italian LLMs across key safety dimensions such as toxicity, bias, and ethical compliance. Beyond presenting the translated dataset, we offer a detailed analysis of its limitations, highlighting the challenges of using translated content as a proxy for native benchmarks. Our findings demonstrate the need for a dedicated, culturally grounded Italian safety benchmark to ensure effective and contextually appropriate evaluations.

BeaverTails-IT: Towards a Safety Benchmark for Evaluating Italian Large Language Models

Key Points

Abstract

Cite This Study